• Open

    FABind: Fast and Accurate Protein-Ligand Binding. (arXiv:2310.06763v2 [cs.LG] UPDATED)
    Modeling the interaction between proteins and ligands and accurately predicting their binding structures is a critical yet challenging task in drug discovery. Recent advancements in deep learning have shown promise in addressing this challenge, with sampling-based and regression-based methods emerging as two prominent approaches. However, these methods have notable limitations. Sampling-based methods often suffer from low efficiency due to the need for generating multiple candidate structures for selection. On the other hand, regression-based methods offer fast predictions but may experience decreased accuracy. Additionally, the variation in protein sizes often requires external modules for selecting suitable binding pockets, further impacting efficiency. In this work, we propose $\mathbf{FABind}$, an end-to-end model that combines pocket prediction and docking to achieve accurate and fast protein-ligand binding. $\mathbf{FABind}$ incorporates a unique ligand-informed pocket prediction module, which is also leveraged for docking pose estimation. The model further enhances the docking process by incrementally integrating the predicted pocket to optimize protein-ligand binding, reducing discrepancies between training and inference. Through extensive experiments on benchmark datasets, our proposed $\mathbf{FABind}$ demonstrates strong advantages in terms of effectiveness and efficiency compared to existing methods. Our code is available at $\href{https://github.com/QizhiPei/FABind}{Github}$.  ( 2 min )
    On Extreme Value Asymptotics of Projected Sample Covariances in High Dimensions with Applications in Finance and Convolutional Networks. (arXiv:2310.08150v1 [math.ST])
    Maximum-type statistics of certain functions of the sample covariance matrix of high-dimensional vector time series are studied to statistically confirm or reject the null hypothesis that a data set has been collected under normal conditions. The approach generalizes the case of the maximal deviation of the sample autocovariances function from its assumed values. Within a linear time series framework it is shown that Gumbel-type extreme value asymptotics holds true. As applications we discuss long-only mimimal-variance portfolio optimization and subportfolio analysis with respect to idiosyncratic risks, ETF index tracking by sparse tracking portfolios, convolutional deep learners for image analysis and the analysis of array-of-sensors data.  ( 2 min )
    GRASP: Accelerating Shortest Path Attacks via Graph Attention. (arXiv:2310.07980v1 [cs.LG])
    Recent advances in machine learning (ML) have shown promise in aiding and accelerating classical combinatorial optimization algorithms. ML-based speed ups that aim to learn in an end to end manner (i.e., directly output the solution) tend to trade off run time with solution quality. Therefore, solutions that are able to accelerate existing solvers while maintaining their performance guarantees, are of great interest. We consider an APX-hard problem, where an adversary aims to attack shortest paths in a graph by removing the minimum number of edges. We propose the GRASP algorithm: Graph Attention Accelerated Shortest Path Attack, an ML aided optimization algorithm that achieves run times up to 10x faster, while maintaining the quality of solution generated. GRASP uses a graph attention network to identify a smaller subgraph containing the combinatorial solution, thus effectively reducing the input problem size. Additionally, we demonstrate how careful representation of the input graph, including node features that correlate well with the optimization task, can highlight important structure in the optimization solution.  ( 2 min )
    GenTKG: Generative Forecasting on Temporal Knowledge Graph. (arXiv:2310.07793v1 [cs.CL])
    The rapid advancements in large language models (LLMs) have ignited interest in the temporal knowledge graph (tKG) domain, where conventional carefully designed embedding-based and rule-based models dominate. The question remains open of whether pre-trained LLMs can understand structured temporal relational data and replace them as the foundation model for temporal relational forecasting. Therefore, we bring temporal knowledge forecasting into the generative setting. However, challenges occur in the huge chasms between complex temporal graph data structure and sequential natural expressions LLMs can handle, and between the enormous data sizes of tKGs and heavy computation costs of finetuning LLMs. To address these challenges, we propose a novel retrieval augmented generation framework that performs generative forecasting on tKGs named GenTKG, which combines a temporal logical rule-based retrieval strategy and lightweight parameter-efficient instruction tuning. Extensive experiments have shown that GenTKG outperforms conventional methods of temporal relational forecasting under low computation resources. GenTKG also highlights remarkable transferability with exceeding performance on unseen datasets without re-training. Our work reveals the huge potential of LLMs in the tKG domain and opens a new frontier for generative forecasting on tKGs.  ( 2 min )
    Neural Combinatorial Optimization with Heavy Decoder: Toward Large Scale Generalization. (arXiv:2310.07985v1 [cs.LG])
    Neural combinatorial optimization (NCO) is a promising learning-based approach for solving challenging combinatorial optimization problems without specialized algorithm design by experts. However, most constructive NCO methods cannot solve problems with large-scale instance sizes, which significantly diminishes their usefulness for real-world applications. In this work, we propose a novel Light Encoder and Heavy Decoder (LEHD) model with a strong generalization ability to address this critical issue. The LEHD model can learn to dynamically capture the relationships between all available nodes of varying sizes, which is beneficial for model generalization to problems of various scales. Moreover, we develop a data-efficient training scheme and a flexible solution construction mechanism for the proposed LEHD model. By training on small-scale problem instances, the LEHD model can generate nearly optimal solutions for the Travelling Salesman Problem (TSP) and the Capacitated Vehicle Routing Problem (CVRP) with up to 1000 nodes, and also generalizes well to solve real-world TSPLib and CVRPLib problems. These results confirm our proposed LEHD model can significantly improve the state-of-the-art performance for constructive NCO. The code is available at https://github.com/CIAM-Group/NCO_code/tree/main/single_objective/LEHD.  ( 2 min )
    Variational operator learning: A unified paradigm marrying training neural operators and solving partial differential equations. (arXiv:2304.04234v2 [cs.LG] UPDATED)
    Neural operators as novel neural architectures for fast approximating solution operators of partial differential equations (PDEs), have shown considerable promise for future scientific computing. However, the mainstream of training neural operators is still data-driven, which needs an expensive ground-truth dataset from various sources (e.g., solving PDEs' samples with the conventional solvers, real-world experiments) in addition to training stage costs. From a computational perspective, marrying operator learning and specific domain knowledge to solve PDEs is an essential step in reducing dataset costs and label-free learning. We propose a novel paradigm that provides a unified framework of training neural operators and solving PDEs with the variational form, which we refer to as the variational operator learning (VOL). Ritz and Galerkin approach with finite element discretization are developed for VOL to achieve matrix-free approximation of system functional and residual, then direct minimization and iterative update are proposed as two optimization strategies for VOL. Various types of experiments based on reasonable benchmarks about variable heat source, Darcy flow, and variable stiffness elasticity are conducted to demonstrate the effectiveness of VOL. With a label-free training set and a 5-label-only shift set, VOL learns solution operators with its test errors decreasing in a power law with respect to the amount of unlabeled data. To the best of the authors' knowledge, this is the first study that integrates the perspectives of the weak form and efficient iterative methods for solving sparse linear systems into the end-to-end operator learning task.  ( 3 min )
    Diffusion-based Generative AI for Exploring Transition States from 2D Molecular Graphs. (arXiv:2304.12233v3 [physics.chem-ph] UPDATED)
    The exploration of transition state (TS) geometries is crucial for elucidating chemical reaction mechanisms and modeling their kinetics. Recently, machine learning (ML) models have shown remarkable performance for prediction of TS geometries. However, they require 3D conformations of reactants and products often with their appropriate orientations as input, which demands substantial efforts and computational cost. Here, we propose a generative approach based on the stochastic diffusion method, namely TSDiff, for prediction of TS geometries just from 2D molecular graphs. TSDiff outperformed the existing ML models with 3D geometries in terms of both accuracy and efficiency. Moreover, it enables to sample various TS conformations, because it learned the distribution of TS geometries for diverse reactions in training. Thus, TSDiff was able to find more favorable reaction pathways with lower barrier heights than those in the reference database. These results demonstrate that TSDiff shows promising potential for an efficient and reliable TS exploration.  ( 2 min )
    LLMMaps -- A Visual Metaphor for Stratified Evaluation of Large Language Models. (arXiv:2304.00457v3 [cs.CL] UPDATED)
    Large Language Models (LLMs) have revolutionized natural language processing and demonstrated impressive capabilities in various tasks. Unfortunately, they are prone to hallucinations, where the model exposes incorrect or false information in its responses, which renders diligent evaluation approaches mandatory. While LLM performance in specific knowledge fields is often evaluated based on question and answer (Q&A) datasets, such evaluations usually report only a single accuracy number for the dataset, which often covers an entire field. This field-based evaluation, is problematic with respect to transparency and model improvement. A stratified evaluation could instead reveal subfields, where hallucinations are more likely to occur and thus help to better assess LLMs' risks and guide their further development. To support such stratified evaluations, we propose LLMMaps as a novel visualization technique that enables users to evaluate LLMs' performance with respect to Q&A datasets. LLMMaps provide detailed insights into LLMs' knowledge capabilities in different subfields, by transforming Q&A datasets as well as LLM responses into an internal knowledge structure. An extension for comparative visualization furthermore, allows for the detailed comparison of multiple LLMs. To assess LLMMaps we use them to conduct a comparative analysis of several state-of-the-art LLMs, such as BLOOM, GPT-2, GPT-3, ChatGPT and LLaMa-13B, as well as two qualitative user evaluations. All necessary source code and data for generating LLMMaps to be used in scientific publications and elsewhere is available on GitHub: https://github.com/viscom-ulm/LLMMaps  ( 3 min )
    A general framework for multi-step ahead adaptive conformal heteroscedastic time series forecasting. (arXiv:2207.14219v9 [stat.ML] UPDATED)
    This paper introduces a novel model-agnostic algorithm called adaptive ensemble batch multi-input multi-output conformalized quantile regression (AEnbMIMOCQR} that enables forecasters to generate multi-step ahead prediction intervals for a fixed pre-specified miscoverage rate in a distribution-free manner. Our method is grounded on conformal prediction principles, however, it does not require data splitting and provides close to exact coverage even when the data is not exchangeable. Moreover, the resulting prediction intervals, besides being empirically valid along the forecast horizon, do not neglect heteroscedasticity. AEnbMIMOCQR is designed to be robust to distribution shifts, which means that its prediction intervals remain reliable over an unlimited period of time, without entailing retraining or imposing unrealistic strict assumptions on the data-generating process. Through methodically experimentation, we demonstrate that our approach outperforms other competitive methods on both real-world and synthetic datasets. The code used in the experimental part and a tutorial on how to use AEnbMIMOCQR can be found at the following GitHub repository: https://github.com/Quilograma/AEnbMIMOCQR.  ( 3 min )
    Conditional Mutual Information for Disentangled Representations in Reinforcement Learning. (arXiv:2305.14133v2 [cs.LG] UPDATED)
    Reinforcement Learning (RL) environments can produce training data with spurious correlations between features due to the amount of training data or its limited feature coverage. This can lead to RL agents encoding these misleading correlations in their latent representation, preventing the agent from generalising if the correlation changes within the environment or when deployed in the real world. Disentangled representations can improve robustness, but existing disentanglement techniques that minimise mutual information between features require independent features, thus they cannot disentangle correlated features. We propose an auxiliary task for RL algorithms that learns a disentangled representation of high-dimensional observations with correlated features by minimising the conditional mutual information between features in the representation. We demonstrate experimentally, using continuous control tasks, that our approach improves generalisation under correlation shifts, as well as improving the training performance of RL algorithms in the presence of correlated features.  ( 2 min )
    Bengali Document Layout Analysis -- A YOLOV8 Based Ensembling Approach. (arXiv:2309.00848v2 [cs.CV] UPDATED)
    This paper focuses on enhancing Bengali Document Layout Analysis (DLA) using the YOLOv8 model and innovative post-processing techniques. We tackle challenges unique to the complex Bengali script by employing data augmentation for model robustness. After meticulous validation set evaluation, we fine-tune our approach on the complete dataset, leading to a two-stage prediction strategy for accurate element segmentation. Our ensemble model, combined with post-processing, outperforms individual base architectures, addressing issues identified in the BaDLAD dataset. By leveraging this approach, we aim to advance Bengali document analysis, contributing to improved OCR and document comprehension and BaDLAD serves as a foundational resource for this endeavor, aiding future research in the field. Furthermore, our experiments provided key insights to incorporate new strategies into the established solution.
    Flood and Echo: Algorithmic Alignment of GNNs with Distributed Computing. (arXiv:2310.06970v2 [cs.LG] UPDATED)
    Graph Neural Networks are a natural fit for learning algorithms. They can directly represent tasks through an abstract but versatile graph structure and handle inputs of different sizes. This opens up the possibility for scaling and extrapolation to larger graphs, one of the most important advantages of an algorithm. However, this raises two core questions i) How can we enable nodes to gather the required information in a given graph ($\textit{information exchange}$), even if is far away and ii) How can we design an execution framework which enables this information exchange for extrapolation to larger graph sizes ($\textit{algorithmic alignment for extrapolation}$). We propose a new execution framework that is inspired by the design principles of distributed algorithms: Flood and Echo Net. It propagates messages through the entire graph in a wave like activation pattern, which naturally generalizes to larger instances. Through its sparse but parallel activations it is provably more efficient in terms of message complexity. We study the proposed model and provide both empirical evidence and theoretical insights in terms of its expressiveness, efficiency, information exchange and ability to extrapolate.
    WiGenAI: The Symphony of Wireless and Generative AI via Diffusion Models. (arXiv:2310.07312v2 [cs.IT] UPDATED)
    Innovative foundation models, such as GPT-3 and stable diffusion models, have made a paradigm shift in the realm of artificial intelligence (AI) towards generative AI-based systems. In unison, from data communication and networking perspective, AI and machine learning (AI/ML) algorithms are envisioned to be pervasively incorporated into the future generations of wireless communications systems, highlighting the need for novel AI-native solutions for the emergent communication scenarios. In this article, we outline the applications of generative AI in wireless communication systems to lay the foundations for research in this field. Diffusion-based generative models, as the new state-of-the-art paradigm of generative models, are introduced, and their applications in wireless communication systems are discussed. Two case studies are also presented to showcase how diffusion models can be exploited for the development of resilient AI-native communication systems. Specifically, we propose denoising diffusion probabilistic models (DDPM) for a wireless communication scheme with non-ideal transceivers, where 30% improvement is achieved in terms of bit error rate. As the second application, DDPMs are employed at the transmitter to shape the constellation symbols, highlighting a robust out-of-distribution performance. Finally, future directions and open issues for the development of generative AI-based wireless systems are discussed to promote future research endeavors towards wireless generative AI (WiGenAI).
    OWAdapt: An adaptive loss function for deep learning using OWA operators. (arXiv:2305.19443v2 [cs.LG] UPDATED)
    In this paper, we propose a fuzzy adaptive loss function for enhancing deep learning performance in classification tasks. Specifically, we redefine the cross-entropy loss to effectively address class-level noise conditions, including the challenging problem of class imbalance. Our approach introduces aggregation operators, leveraging the power of fuzzy logic to improve classification accuracy. The rationale behind our proposed method lies in the iterative up-weighting of class-level components within the loss function, focusing on those with larger errors. To achieve this, we employ the ordered weighted average (OWA) operator and combine it with an adaptive scheme for gradient-based learning. Through extensive experimentation, our method outperforms other commonly used loss functions, such as the standard cross-entropy or focal loss, across various binary and multiclass classification tasks. Furthermore, we explore the influence of hyperparameters associated with the OWA operators and present a default configuration that performs well across different experimental settings.  ( 2 min )
    ImageNomer: description of a functional connectivity and omics analysis tool and case study identifying a race confound. (arXiv:2302.00767v2 [q-bio.PE] UPDATED)
    Most packages for the analysis of fMRI-based functional connectivity (FC) and genomic data are used with a programming language interface, lacking an easy-to-navigate GUI frontend. This exacerbates two problems found in these types of data: demographic confounds and quality control in the face of high dimensionality of features. The reason is that it is too slow and cumbersome to use a programming interface to create all the necessary visualizations required to identify all correlations, confounding effects, or quality control problems in a dataset. To remedy this situation, we have developed ImageNomer, a data visualization and analysis tool that allows inspection of both subject-level and cohort-level demographic, genomic, and imaging features. The software is Python-based, runs in a self-contained Docker image, and contains a browser-based GUI frontend. We demonstrate the usefulness of ImageNomer by identifying an unexpected race confound when predicting achievement scores in the Philadelphia Neurodevelopmental Cohort (PNC) dataset. In the past, many studies have attempted to use FC to identify achievement-related features in fMRI. Using ImageNomer, we find a clear potential for confounding effects of race. Using correlation analysis in the ImageNomer software, we show that FCs correlated with Wide Range Achievement Test (WRAT) score are in fact more highly correlated with race. Investigating further, we find that whereas both FC and SNP (genomic) features can account for 10-15\% of WRAT score variation, this predictive ability disappears when controlling for race. In this work, we demonstrate the advantage of our ImageNomer GUI tool in data exploration and confound detection. Additionally, this work identifies race as a strong confound in FC data and casts doubt on the possibility of finding unbiased achievement-related features in fMRI and SNP data of healthy adolescents.
    DeepSpeed4Science Initiative: Enabling Large-Scale Scientific Discovery through Sophisticated AI System Technologies. (arXiv:2310.04610v2 [cs.AI] UPDATED)
    In the upcoming decade, deep learning may revolutionize the natural sciences, enhancing our capacity to model and predict natural occurrences. This could herald a new era of scientific exploration, bringing significant advancements across sectors from drug development to renewable energy. To answer this call, we present DeepSpeed4Science initiative (deepspeed4science.ai) which aims to build unique capabilities through AI system technology innovations to help domain experts to unlock today's biggest science mysteries. By leveraging DeepSpeed's current technology pillars (training, inference and compression) as base technology enablers, DeepSpeed4Science will create a new set of AI system technologies tailored for accelerating scientific discoveries by addressing their unique complexity beyond the common technical approaches used for accelerating generic large language models (LLMs). In this paper, we showcase the early progress we made with DeepSpeed4Science in addressing two of the critical system challenges in structural biology research.
    Conditional Sig-Wasserstein GANs for Time Series Generation. (arXiv:2006.05421v2 [cs.LG] UPDATED)
    Generative adversarial networks (GANs) have been extremely successful in generating samples, from seemingly high dimensional probability measures. However, these methods struggle to capture the temporal dependence of joint probability distributions induced by time-series data. Furthermore, long time-series data streams hugely increase the dimension of the target space, which may render generative modelling infeasible. To overcome these challenges, motivated by the autoregressive models in econometric, we are interested in the conditional distribution of future time series given the past information. We propose the generic conditional Sig-WGAN framework by integrating Wasserstein-GANs (WGANs) with mathematically principled and efficient path feature extraction called the signature of a path. The signature of a path is a graded sequence of statistics that provides a universal description for a stream of data, and its expected value characterises the law of the time-series model. In particular, we develop the conditional Sig-$W_1$ metric, that captures the conditional joint law of time series models, and use it as a discriminator. The signature feature space enables the explicit representation of the proposed discriminators which alleviates the need for expensive training. We validate our method on both synthetic and empirical dataset and observe that our method consistently and significantly outperforms state-of-the-art benchmarks with respect to measures of similarity and predictive ability.  ( 3 min )
    Smoothed $f$-Divergence Distributionally Robust Optimization. (arXiv:2306.14041v2 [math.OC] UPDATED)
    In data-driven optimization, sample average approximation (SAA) is known to suffer from the so-called optimizer's curse that causes an over-optimistic evaluation of the solution performance. We argue that a special type of distributionallly robust optimization (DRO) formulation offers theoretical advantages in correcting for this optimizer's curse compared to simple ``margin'' adjustments to SAA and other DRO approaches: It attains a statistical bound on the out-of-sample performance, for a wide class of objective functions and distributions, that is nearly tightest in terms of exponential decay rate. This DRO uses an ambiguity set based on a Kullback Leibler (KL) divergence smoothed by the Wasserstein or L\'evy-Prokhorov (LP) distance via a suitable distance optimization. Computationally, we also show that such a DRO, and its generalized versions using smoothed $f$-divergence, are not harder than DRO problems based on $f$-divergence or Wasserstein distances, rendering our DRO formulations both statistically optimal and computationally viable.  ( 2 min )
    Exploring the Relationship Between Model Architecture and In-Context Learning Ability. (arXiv:2310.08049v1 [cs.LG])
    What is the relationship between model architecture and the ability to perform in-context learning? In this empirical study, we take the first steps towards answering this question. In particular, we evaluate fifteen model architectures across a suite of synthetic in-context learning tasks. The selected architectures represent a broad range of paradigms, including recurrent and convolution-based neural networks, transformers, and emerging attention alternatives. We discover that all considered architectures can perform in-context learning under certain conditions. However, contemporary architectures are found to be the best performing, especially as task complexity grows. Additionally, our follow-up experiments delve into various factors that influence in-context learning. We observe varied sensitivities among architectures with respect to hyperparameter settings. Our study of training dynamics reveals that certain architectures exhibit a smooth, progressive learning trajectory, while others demonstrate periods of stagnation followed by abrupt mastery of the task. Finally, and somewhat surprisingly, we find that several emerging attention alternatives are more robust in-context learners than transformers; since such approaches have constant-sized memory footprints at inference time, this result opens the future possibility of scaling up in-context learning to vastly larger numbers of in-context examples.
    BarlowRL: Barlow Twins for Data-Efficient Reinforcement Learning. (arXiv:2308.04263v3 [cs.LG] UPDATED)
    This paper introduces BarlowRL, a data-efficient reinforcement learning agent that combines the Barlow Twins self-supervised learning framework with DER (Data-Efficient Rainbow) algorithm. BarlowRL outperforms both DER and its contrastive counterpart CURL on the Atari 100k benchmark. BarlowRL avoids dimensional collapse by enforcing information spread to the whole space. This helps RL algorithms to utilize uniformly spread state representation that eventually results in a remarkable performance. The integration of Barlow Twins with DER enhances data efficiency and achieves superior performance in the RL tasks. BarlowRL demonstrates the potential of incorporating self-supervised learning techniques to improve RL algorithms.
    Network Synthetic Interventions: A Causal Framework for Panel Data Under Network Interference. (arXiv:2210.11355v2 [econ.EM] UPDATED)
    We propose a generalization of the synthetic controls and synthetic interventions methodology to incorporate network interference. We consider the estimation of unit-specific potential outcomes from panel data in the presence of spillover across units and unobserved confounding. Key to our approach is a novel latent factor model that takes into account network interference and generalizes the factor models typically used in panel data settings. We propose an estimator, Network Synthetic Interventions (NSI), and show that it consistently estimates the mean outcomes for a unit under an arbitrary set of counterfactual treatments for the network. We further establish that the estimator is asymptotically normal. We furnish two validity tests for whether the NSI estimator reliably generalizes to produce accurate counterfactual estimates. We provide a novel graph-based experiment design that guarantees the NSI estimator produces accurate counterfactual estimates, and also analyze the sample complexity of the proposed design. We conclude with simulations that corroborate our theoretical findings.
    Towards Data-and Knowledge-Driven Artificial Intelligence: A Survey on Neuro-Symbolic Computing. (arXiv:2210.15889v4 [cs.AI] UPDATED)
    Neural-symbolic computing (NeSy), which pursues the integration of the symbolic and statistical paradigms of cognition, has been an active research area of Artificial Intelligence (AI) for many years. As NeSy shows promise of reconciling the advantages of reasoning and interpretability of symbolic representation and robust learning in neural networks, it may serve as a catalyst for the next generation of AI. In the present paper, we provide a systematic overview of the recent developments and important contributions of NeSy research. Firstly, we introduce study history of this area, covering early work and foundations. We further discuss background concepts and identify key driving factors behind the development of NeSy. Afterward, we categorize recent landmark approaches along several main characteristics that underline this research paradigm, including neural-symbolic integration, knowledge representation, knowledge embedding, and functionality. Next, we briefly discuss the successful application of modern NeSy approaches in several domains. Then, we benchmark several NeSy methods on three representative application tasks. Finally, we identify the open problems together with potential future research directions. This survey is expected to help new researchers enter this rapidly evolving field and accelerate the progress towards data-and knowledge-driven AI.  ( 2 min )
    GePSAn: Generative Procedure Step Anticipation in Cooking Videos. (arXiv:2310.08312v1 [cs.CV])
    We study the problem of future step anticipation in procedural videos. Given a video of an ongoing procedural activity, we predict a plausible next procedure step described in rich natural language. While most previous work focus on the problem of data scarcity in procedural video datasets, another core challenge of future anticipation is how to account for multiple plausible future realizations in natural settings. This problem has been largely overlooked in previous work. To address this challenge, we frame future step prediction as modelling the distribution of all possible candidates for the next step. Specifically, we design a generative model that takes a series of video clips as input, and generates multiple plausible and diverse candidates (in natural language) for the next step. Following previous work, we side-step the video annotation scarcity by pretraining our model on a large text-based corpus of procedural activities, and then transfer the model to the video domain. Our experiments, both in textual and video domains, show that our model captures diversity in the next step prediction and generates multiple plausible future predictions. Moreover, our model establishes new state-of-the-art results on YouCookII, where it outperforms existing baselines on the next step anticipation. Finally, we also show that our model can successfully transfer from text to the video domain zero-shot, ie, without fine-tuning or adaptation, and produces good-quality future step predictions from video.
    GraphControl: Adding Conditional Control to Universal Graph Pre-trained Models for Graph Domain Transfer Learning. (arXiv:2310.07365v2 [cs.LG] UPDATED)
    Graph-structured data is ubiquitous in the world which models complex relationships between objects, enabling various Web applications. Daily influxes of unlabeled graph data on the Web offer immense potential for these applications. Graph self-supervised algorithms have achieved significant success in acquiring generic knowledge from abundant unlabeled graph data. These pre-trained models can be applied to various downstream Web applications, saving training time and improving downstream (target) performance. However, different graphs, even across seemingly similar domains, can differ significantly in terms of attribute semantics, posing difficulties, if not infeasibility, for transferring the pre-trained models to downstream tasks. Concretely speaking, for example, the additional task-specific node information in downstream tasks (specificity) is usually deliberately omitted so that the pre-trained representation (transferability) can be leveraged. The trade-off as such is termed as "transferability-specificity dilemma" in this work. To address this challenge, we introduce an innovative deployment module coined as GraphControl, motivated by ControlNet, to realize better graph domain transfer learning. Specifically, by leveraging universal structural pre-trained models and GraphControl, we align the input space across various graphs and incorporate unique characteristics of target data as conditional inputs. These conditions will be progressively integrated into the model during fine-tuning or prompt tuning through ControlNet, facilitating personalized deployment. Extensive experiments show that our method significantly enhances the adaptability of pre-trained models on target attributed datasets, achieving 1.4-3x performance gain. Furthermore, it outperforms training-from-scratch methods on target data with a comparable margin and exhibits faster convergence.
    Distilling Large Vision-Language Model with Out-of-Distribution Generalizability. (arXiv:2307.03135v3 [cs.CV] UPDATED)
    Large vision-language models have achieved outstanding performance, but their size and computational requirements make their deployment on resource-constrained devices and time-sensitive tasks impractical. Model distillation, the process of creating smaller, faster models that maintain the performance of larger models, is a promising direction towards the solution. This paper investigates the distillation of visual representations in large teacher vision-language models into lightweight student models using a small- or mid-scale dataset. Notably, this study focuses on open-vocabulary out-of-distribution (OOD) generalization, a challenging problem that has been overlooked in previous model distillation literature. We propose two principles from vision and language modality perspectives to enhance student's OOD generalization: (1) by better imitating teacher's visual representation space, and carefully promoting better coherence in vision-language alignment with the teacher; (2) by enriching the teacher's language representations with informative and finegrained semantic attributes to effectively distinguish between different labels. We propose several metrics and conduct extensive experiments to investigate their techniques. The results demonstrate significant improvements in zero-shot and few-shot student performance on open-vocabulary out-of-distribution classification, highlighting the effectiveness of our proposed approaches. Poster: https://xuanlinli17.github.io/pdfs/iccv23_large_vlm_distillation_poster.pdf Code: https://github.com/xuanlinli17/large_vlm_distillation_ood  ( 2 min )
    Imitation Learning from Observation with Automatic Discount Scheduling. (arXiv:2310.07433v2 [cs.RO] UPDATED)
    Humans often acquire new skills through observation and imitation. For robotic agents, learning from the plethora of unlabeled video demonstration data available on the Internet necessitates imitating the expert without access to its action, presenting a challenge known as Imitation Learning from Observations (ILfO). A common approach to tackle ILfO problems is to convert them into inverse reinforcement learning problems, utilizing a proxy reward computed from the agent's and the expert's observations. Nonetheless, we identify that tasks characterized by a progress dependency property pose significant challenges for such approaches; in these tasks, the agent needs to initially learn the expert's preceding behaviors before mastering the subsequent ones. Our investigation reveals that the main cause is that the reward signals assigned to later steps hinder the learning of initial behaviors. To address this challenge, we present a novel ILfO framework that enables the agent to master earlier behaviors before advancing to later ones. We introduce an Automatic Discount Scheduling (ADS) mechanism that adaptively alters the discount factor in reinforcement learning during the training phase, prioritizing earlier rewards initially and gradually engaging later rewards only when the earlier behaviors have been mastered. Our experiments, conducted on nine Meta-World tasks, demonstrate that our method significantly outperforms state-of-the-art methods across all tasks, including those that are unsolvable by them.
    PromptTTS 2: Describing and Generating Voices with Text Prompt. (arXiv:2309.02285v2 [eess.AS] UPDATED)
    Speech conveys more information than text, as the same word can be uttered in various voices to convey diverse information. Compared to traditional text-to-speech (TTS) methods relying on speech prompts (reference speech) for voice variability, using text prompts (descriptions) is more user-friendly since speech prompts can be hard to find or may not exist at all. TTS approaches based on the text prompt face two main challenges: 1) the one-to-many problem, where not all details about voice variability can be described in the text prompt, and 2) the limited availability of text prompt datasets, where vendors and large cost of data labeling are required to write text prompts for speech. In this work, we introduce PromptTTS 2 to address these challenges with a variation network to provide variability information of voice not captured by text prompts, and a prompt generation pipeline to utilize the large language models (LLM) to compose high quality text prompts. Specifically, the variation network predicts the representation extracted from the reference speech (which contains full information about voice variability) based on the text prompt representation. For the prompt generation pipeline, it generates text prompts for speech with a speech language understanding model to recognize voice attributes (e.g., gender, speed) from speech and a large language model to formulate text prompts based on the recognition results. Experiments on a large-scale (44K hours) speech dataset demonstrate that compared to the previous works, PromptTTS 2 generates voices more consistent with text prompts and supports the sampling of diverse voice variability, thereby offering users more choices on voice generation. Additionally, the prompt generation pipeline produces high-quality text prompts, eliminating the large labeling cost. The demo page of PromptTTS 2 is available online.
    A Neural-preconditioned Poisson Solver for Mixed Dirichlet and Neumann Boundary Conditions. (arXiv:2310.00177v3 [math.NA] UPDATED)
    We introduce a neural-preconditioned iterative solver for Poisson equations with mixed boundary conditions. The Poisson equation is ubiquitous in scientific computing: it governs a wide array of physical phenomena, arises as a subproblem in many numerical algorithms, and serves as a model problem for the broader class of elliptic PDEs. The most popular Poisson discretizations yield large sparse linear systems. At high resolution, and for performance-critical applications, iterative solvers can be advantageous for these -- but only when paired with powerful preconditioners. The core of our solver is a neural network trained to approximate the inverse of a discrete structured-grid Laplace operator for a domain of arbitrary shape and with mixed boundary conditions. The structure of this problem motivates a novel network architecture that we demonstrate is highly effective as a preconditioner even for boundary conditions outside the training set. We show that on challenging test cases arising from an incompressible fluid simulation, our method outperforms state-of-the-art solvers like algebraic multigrid as well as some recent neural preconditioners.
    TEMPO: Prompt-based Generative Pre-trained Transformer for Time Series Forecasting. (arXiv:2310.04948v2 [cs.LG] UPDATED)
    The past decade has witnessed significant advances in time series modeling with deep learning. While achieving state-of-the-art results, the best-performing architectures vary highly across applications and domains. Meanwhile, for natural language processing, the Generative Pre-trained Transformer (GPT) has demonstrated impressive performance via training one general-purpose model across various textual datasets. It is intriguing to explore whether GPT-type architectures can be effective for time series, capturing the intrinsic dynamic attributes and leading to significant accuracy improvements. In this paper, we propose a novel framework, TEMPO, that can effectively learn time series representations. We focus on utilizing two essential inductive biases of the time series task for pre-trained models: (i) decomposition of the complex interaction between trend, seasonal and residual components; and (ii) introducing the selection-based prompts to facilitate distribution adaptation in non-stationary time series. TEMPO expands the capability for dynamically modeling real-world temporal phenomena from data within diverse domains. Our experiments demonstrate the superior performance of TEMPO over state-of-the-art methods on a number of time series benchmark datasets. This performance gain is observed not only in standard supervised learning settings but also in scenarios involving previously unseen datasets as well as in scenarios with multi-modal inputs. This compelling finding highlights TEMPO's potential to constitute a foundational model-building framework.
    Efficient probabilistic reconciliation of forecasts for real-valued and count time series. (arXiv:2210.02286v3 [stat.ML] UPDATED)
    Hierarchical time series are common in several applied fields. The forecasts for these time series are required to be coherent, that is, to satisfy the constraints given by the hierarchy. The most popular technique to enforce coherence is called reconciliation, which adjusts the base forecasts computed for each time series. However, recent works on probabilistic reconciliation present several limitations. In this paper, we propose a new approach based on conditioning to reconcile any type of forecast distribution. We then introduce a new algorithm, called Bottom-Up Importance Sampling, to efficiently sample from the reconciled distribution. It can be used for any base forecast distribution: discrete, continuous, or in the form of samples, providing a major speedup compared to the current methods. Experiments on several temporal hierarchies show a significant improvement over base probabilistic forecasts.  ( 2 min )
    Lion Secretly Solves Constrained Optimization: As Lyapunov Predicts. (arXiv:2310.05898v2 [cs.LG] UPDATED)
    Lion (Evolved Sign Momentum), a new optimizer discovered through program search, has shown promising results in training large AI models. It performs comparably or favorably to AdamW but with greater memory efficiency. As we can expect from the results of a random search program, Lion incorporates elements from several existing algorithms, including signed momentum, decoupled weight decay, Polak, and Nesterov momentum, but does not fit into any existing category of theoretically grounded optimizers. Thus, even though Lion appears to perform well as a general-purpose optimizer for a wide range of tasks, its theoretical basis remains uncertain. This lack of theoretical clarity limits opportunities to further enhance and expand Lion's efficacy. This work aims to demystify Lion. Based on both continuous-time and discrete-time analysis, we demonstrate that Lion is a theoretically novel and principled approach for minimizing a general loss function $f(x)$ while enforcing a bound constraint $\|x\|_\infty \leq 1/\lambda$. Lion achieves this through the incorporation of decoupled weight decay, where $\lambda$ represents the weight decay coefficient. Our analysis is made possible by the development of a new Lyapunov function for the Lion updates. It applies to a broader family of Lion-$\kappa$ algorithms, where the $\text{sign}(\cdot)$ operator in Lion is replaced by the subgradient of a convex function $\kappa$, leading to the solution of a general composite optimization problem of $\min_x f(x) + \kappa^*(x)$. Our findings provide valuable insights into the dynamics of Lion and pave the way for further improvements and extensions of Lion-related algorithms.
    Clustering Three-Way Data with Outliers. (arXiv:2310.05288v2 [stat.ML] UPDATED)
    Matrix-variate distributions are a recent addition to the model-based clustering field, thereby making it possible to analyze data in matrix form with complex structure such as images and time series. Due to its recent appearance, there is limited literature on matrix-variate data, with even less on dealing with outliers in these models. An approach for clustering matrix-variate normal data with outliers is discussed. The approach, which uses the distribution of subset log-likelihoods, extends the OCLUST algorithm to matrix-variate normal data and uses an iterative approach to detect and trim outliers.
    Federated Generalization via Information-Theoretic Distribution Diversification. (arXiv:2310.07171v2 [cs.LG] UPDATED)
    Federated Learning (FL) has surged in prominence due to its capability of collaborative model training without direct data sharing. However, the vast disparity in local data distributions among clients, often termed the non-Independent Identically Distributed (non-IID) challenge, poses a significant hurdle to FL's generalization efficacy. The scenario becomes even more complex when not all clients participate in the training process, a common occurrence due to unstable network connections or limited computational capacities. This can greatly complicate the assessment of the trained models' generalization abilities. While a plethora of recent studies has centered on the generalization gap pertaining to unseen data from participating clients with diverse distributions, the divergence between the training distributions of participating clients and the testing distributions of non-participating ones has been largely overlooked. In response, our paper unveils an information-theoretic generalization framework for FL. Specifically, it quantifies generalization errors by evaluating the information entropy of local distributions and discerning discrepancies across these distributions. Inspired by our deduced generalization bounds, we introduce a weighted aggregation approach and a duo of client selection strategies. These innovations aim to bolster FL's generalization prowess by encompassing a more varied set of client data distributions. Our extensive empirical evaluations reaffirm the potency of our proposed methods, aligning seamlessly with our theoretical construct.
    SpikeCLIP: A Contrastive Language-Image Pretrained Spiking Neural Network. (arXiv:2310.06488v2 [cs.NE] UPDATED)
    Spiking neural networks (SNNs) have demonstrated the capability to achieve comparable performance to deep neural networks (DNNs) in both visual and linguistic domains while offering the advantages of improved energy efficiency and adherence to biological plausibility. However, the extension of such single-modality SNNs into the realm of multimodal scenarios remains an unexplored territory. Drawing inspiration from the concept of contrastive language-image pre-training (CLIP), we introduce a novel framework, named SpikeCLIP, to address the gap between two modalities within the context of spike-based computing through a two-step recipe involving ``Alignment Pre-training + Dual-Loss Fine-tuning". Extensive experiments demonstrate that SNNs achieve comparable results to their DNN counterparts while significantly reducing energy consumption across a variety of datasets commonly used for multimodal model evaluation. Furthermore, SpikeCLIP maintains robust performance in image classification tasks that involve class labels not predefined within specific categories.
    GP-net: Flexible Viewpoint Grasp Proposal. (arXiv:2209.10404v3 [cs.RO] UPDATED)
    We present the Grasp Proposal Network (GP-net), a Convolutional Neural Network model which can generate 6-DoF grasps from flexible viewpoints, e.g. as experienced by mobile manipulators. To train GP-net, we synthetically generate a dataset containing depth-images and ground-truth grasp information. In real-world experiments, we use the EGAD evaluation benchmark to evaluate GP-net against two commonly used algorithms, the Volumetric Grasping Network (VGN) and the Grasp Pose Detection package (GPD), on a PAL TIAGo mobile manipulator. In contrast to the state-of-the-art methods in robotic grasping, GP-net can be used for grasping objects from flexible, unknown viewpoints without the need to define the workspace and achieves a grasp success of 54.4% compared to 51.6% for VGN and 44.2% for GPD. We provide a ROS package along with our code and pre-trained models at https://aucoroboticsmu.github.io/GP-net/.
    FedSym: Unleashing the Power of Entropy for Benchmarking the Algorithms for Federated Learning. (arXiv:2310.07807v1 [cs.LG])
    Federated learning (FL) is a decentralized machine learning approach where independent learners process data privately. Its goal is to create a robust and accurate model by aggregating and retraining local models over multiple rounds. However, FL faces challenges regarding data heterogeneity and model aggregation effectiveness. In order to simulate real-world data, researchers use methods for data partitioning that transform a dataset designated for centralized learning into a group of sub-datasets suitable for distributed machine learning with different data heterogeneity. In this paper, we study the currently popular data partitioning techniques and visualize their main disadvantages: the lack of precision in the data diversity, which leads to unreliable heterogeneity indexes, and the inability to incrementally challenge the FL algorithms. To resolve this problem, we propose a method that leverages entropy and symmetry to construct 'the most challenging' and controllable data distributions with gradual difficulty. We introduce a metric to measure data heterogeneity among the learning agents and a transformation technique that divides any dataset into splits with precise data diversity. Through a comparative study, we demonstrate the superiority of our method over existing FL data partitioning approaches, showcasing its potential to challenge model aggregation algorithms. Experimental results indicate that our approach gradually challenges the FL strategies, and the models trained on FedSym distributions are more distinct.
    GPT-4 as an Agronomist Assistant? Answering Agriculture Exams Using Large Language Models. (arXiv:2310.06225v2 [cs.AI] UPDATED)
    Large language models (LLMs) have demonstrated remarkable capabilities in natural language understanding across various domains, including healthcare and finance. For some tasks, LLMs achieve similar or better performance than trained human beings, therefore it is reasonable to employ human exams (e.g., certification tests) to assess the performance of LLMs. We present a comprehensive evaluation of popular LLMs, such as Llama 2 and GPT, on their ability to answer agriculture-related questions. In our evaluation, we also employ RAG (Retrieval-Augmented Generation) and ER (Ensemble Refinement) techniques, which combine information retrieval, generation capabilities, and prompting strategies to improve the LLMs' performance. To demonstrate the capabilities of LLMs, we selected agriculture exams and benchmark datasets from three of the largest agriculture producer countries: Brazil, India, and the USA. Our analysis highlights GPT-4's ability to achieve a passing score on exams to earn credits for renewing agronomist certifications, answering 93% of the questions correctly and outperforming earlier general-purpose models, which achieved 88% accuracy. On one of our experiments, GPT-4 obtained the highest performance when compared to human subjects. This performance suggests that GPT-4 could potentially pass on major graduate education admission tests or even earn credits for renewing agronomy certificates. We also explore the models' capacity to address general agriculture-related questions and generate crop management guidelines for Brazilian and Indian farmers, utilizing robust datasets from the Brazilian Agency of Agriculture (Embrapa) and graduate program exams from India. The results suggest that GPT-4, ER, and RAG can contribute meaningfully to agricultural education, assessment, and crop management practice, offering valuable insights to farmers and agricultural professionals.
    NECO: NEural Collapse Based Out-of-distribution detection. (arXiv:2310.06823v2 [stat.ML] UPDATED)
    Detecting out-of-distribution (OOD) data is a critical challenge in machine learning due to model overconfidence, often without awareness of their epistemological limits. We hypothesize that ``neural collapse'', a phenomenon affecting in-distribution data for models trained beyond loss convergence, also influences OOD data. To benefit from this interplay, we introduce NECO, a novel post-hoc method for OOD detection, which leverages the geometric properties of ``neural collapse'' and of principal component spaces to identify OOD data. Our extensive experiments demonstrate that NECO achieves state-of-the-art results on both small and large-scale OOD detection tasks while exhibiting strong generalization capabilities across different network architectures. Furthermore, we provide a theoretical explanation for the effectiveness of our method in OOD detection. We plan to release the code after the anonymity period.
    Locality-Aware Generalizable Implicit Neural Representation. (arXiv:2310.05624v2 [cs.LG] UPDATED)
    Generalizable implicit neural representation (INR) enables a single continuous function, i.e., a coordinate-based neural network, to represent multiple data instances by modulating its weights or intermediate features using latent codes. However, the expressive power of the state-of-the-art modulation is limited due to its inability to localize and capture fine-grained details of data entities such as specific pixels and rays. To address this issue, we propose a novel framework for generalizable INR that combines a transformer encoder with a locality-aware INR decoder. The transformer encoder predicts a set of latent tokens from a data instance to encode local information into each latent token. The locality-aware INR decoder extracts a modulation vector by selectively aggregating the latent tokens via cross-attention for a coordinate input and then predicts the output by progressively decoding with coarse-to-fine modulation through multiple frequency bandwidths. The selective token aggregation and the multi-band feature modulation enable us to learn locality-aware representation in spatial and spectral aspects, respectively. Our framework significantly outperforms previous generalizable INRs and validates the usefulness of the locality-aware latents for downstream tasks such as image generation.
    Defending Our Privacy With Backdoors. (arXiv:2310.08320v1 [cs.LG])
    The proliferation of large AI models trained on uncurated, often sensitive web-scraped data has raised significant privacy concerns. One of the concerns is that adversaries can extract information about the training data using privacy attacks. Unfortunately, the task of removing specific information from the models without sacrificing performance is not straightforward and has proven to be challenging. We propose a rather easy yet effective defense based on backdoor attacks to remove private information such as names of individuals from models, and focus in this work on text encoders. Specifically, through strategic insertion of backdoors, we align the embeddings of sensitive phrases with those of neutral terms-"a person" instead of the person's name. Our empirical results demonstrate the effectiveness of our backdoor-based defense on CLIP by assessing its performance using a specialized privacy attack for zero-shot classifiers. Our approach provides not only a new "dual-use" perspective on backdoor attacks, but also presents a promising avenue to enhance the privacy of individuals within models trained on uncurated web-scraped data.
    On Regularized Sparse Logistic Regression. (arXiv:2309.05925v2 [cs.LG] UPDATED)
    Sparse logistic regression is for classification and feature selection simultaneously. Although many studies have been done to solve $\ell_1$-regularized logistic regression, there is no equivalently abundant work on solving sparse logistic regression with nonconvex regularization term. In this paper, we propose a unified framework to solve $\ell_1$-regularized logistic regression, which can be naturally extended to nonconvex regularization term, as long as certain requirement is satisfied. In addition, we also utilize a different line search criteria to guarantee monotone convergence for various regularization terms. Empirical experiments on binary classification tasks with real-world datasets demonstrate our proposed algorithms are capable of performing classification and feature selection effectively at a lower computational cost.
    Rethinking Negative Pairs in Code Search. (arXiv:2310.08069v1 [cs.SE])
    Recently, contrastive learning has become a key component in fine-tuning code search models for software development efficiency and effectiveness. It pulls together positive code snippets while pushing negative samples away given search queries. Among contrastive learning, InfoNCE is the most widely used loss function due to its better performance. However, the following problems in negative samples of InfoNCE may deteriorate its representation learning: 1) The existence of false negative samples in large code corpora due to duplications. 2). The failure to explicitly differentiate between the potential relevance of negative samples. As an example, a bubble sorting algorithm example is less ``negative'' than a file saving function for the quick sorting algorithm query. In this paper, we tackle the above problems by proposing a simple yet effective Soft-InfoNCE loss that inserts weight terms into InfoNCE. In our proposed loss function, we apply three methods to estimate the weights of negative pairs and show that the vanilla InfoNCE loss is a special case of Soft-InfoNCE. Theoretically, we analyze the effects of Soft-InfoNCE on controlling the distribution of learnt code representations and on deducing a more precise mutual information estimation. We furthermore discuss the superiority of proposed loss functions with other design alternatives. Extensive experiments demonstrate the effectiveness of Soft-InfoNCE and weights estimation methods under state-of-the-art code search models on a large-scale public dataset consisting of six programming languages. Source code is available at \url{https://github.com/Alex-HaochenLi/Soft-InfoNCE}.
    Learning Collaborative Information Dissemination with Graph-based Multi-Agent Reinforcement Learning. (arXiv:2308.16198v2 [cs.LG] UPDATED)
    In modern communication systems, efficient and reliable information dissemination is crucial for supporting critical operations across domains like disaster response, autonomous vehicles, and sensor networks. This paper introduces a Multi-Agent Reinforcement Learning (MARL) approach as a significant step forward in achieving more decentralized, efficient, and collaborative solutions. We propose a Partially Observable Stochastic Game (POSG) formulation for information dissemination empowering each agent to decide on message forwarding independently, based on their one-hop neighborhood. This constitutes a significant paradigm shift from traditional heuristics based on Multi-Point Relay (MPR) selection. Our approach harnesses Graph Convolutional Reinforcement Learning, employing Graph Attention Networks (GAT) with dynamic attention to capture essential network features. We propose two approaches, L-DGN and HL-DGN, which differ in the information that is exchanged among agents. We evaluate the performance of our decentralized approaches, by comparing them with a widely-used MPR heuristic, and we show that our trained policies are able to efficiently cover the network while bypassing the MPR set selection process. Our approach is a first step toward supporting the resilience of real-world broadcast communication infrastructures via learned, collaborative information dissemination.
    Nest-DGIL: Nesterov-optimized Deep Geometric Incremental Learning for CS Image Reconstruction. (arXiv:2308.03807v2 [eess.IV] UPDATED)
    Proximal gradient-based optimization is one of the most common strategies to solve inverse problem of images, and it is easy to implement. However, these techniques often generate heavy artifacts in image reconstruction. One of the most popular refinement methods is to fine-tune the regularization parameter to alleviate such artifacts, but it may not always be sufficient or applicable due to increased computational costs. In this work, we propose a deep geometric incremental learning framework based on the second Nesterov proximal gradient optimization. The proposed end-to-end network not only has the powerful learning ability for high-/low-frequency image features, but also can theoretically guarantee that geometric texture details will be reconstructed from preliminary linear reconstruction. Furthermore, it can avoid the risk of intermediate reconstruction results falling outside the geometric decomposition domains and achieve fast convergence. Our reconstruction framework is decomposed into four modules including general linear reconstruction, cascade geometric incremental restoration, Nesterov acceleration, and post-processing. In the image restoration step, a cascade geometric incremental learning module is designed to compensate for missing texture information from different geometric spectral decomposition domains. Inspired by the overlap-tile strategy, we also develop a post-processing module to remove the block effect in patch-wise-based natural image reconstruction. All parameters in the proposed model are learnable, an adaptive initialization technique of physical parameters is also employed to make model flexibility and ensure converging smoothly. We compare the reconstruction performance of the proposed method with existing state-of-the-art methods to demonstrate its superiority. Our source codes are available at https://github.com/fanxiaohong/Nest-DGIL.
    COVID-19 Detection Using Swin Transformer Approach from Computed Tomography Images. (arXiv:2310.08165v1 [eess.IV])
    The accurate and efficient diagnosis of COVID-19 is of paramount importance, particularly in the context of large-scale medical imaging datasets. In this preprint paper, we propose a novel approach for COVID-19 diagnosis using CT images that leverages the power of Swin Transformer models, state-of-the-art solutions in computer vision tasks. Our method includes a systematic approach for patient-level predictions, where individual CT slices are classified as COVID-19 or non-COVID, and the patient's overall diagnosis is determined through majority voting. The application of the Swin Transformer in this context results in patient-level predictions that demonstrate exceptional diagnostic accuracy. In terms of evaluation metrics, our approach consistently outperforms the baseline, as well as numerous competing methods, showcasing its effectiveness in COVID-19 diagnosis. The macro F1 score achieved by our model exceeds the baseline and offers a robust solution for accurate diagnosis.
    Learn From Model Beyond Fine-Tuning: A Survey. (arXiv:2310.08184v1 [cs.AI])
    Foundation models (FM) have demonstrated remarkable performance across a wide range of tasks (especially in the fields of natural language processing and computer vision), primarily attributed to their ability to comprehend instructions and access extensive, high-quality data. This not only showcases their current effectiveness but also sets a promising trajectory towards the development of artificial general intelligence. Unfortunately, due to multiple constraints, the raw data of the model used for large model training are often inaccessible, so the use of end-to-end models for downstream tasks has become a new research trend, which we call Learn From Model (LFM) in this article. LFM focuses on the research, modification, and design of FM based on the model interface, so as to better understand the model structure and weights (in a black box environment), and to generalize the model to downstream tasks. The study of LFM techniques can be broadly categorized into five major areas: model tuning, model distillation, model reuse, meta learning and model editing. Each category encompasses a repertoire of methods and strategies that aim to enhance the capabilities and performance of FM. This paper gives a comprehensive review of the current methods based on FM from the perspective of LFM, in order to help readers better understand the current research status and ideas. To conclude, we summarize the survey by highlighting several critical areas for future exploration and addressing open issues that require further attention from the research community. The relevant papers we investigated in this article can be accessed at .
    Emergence of Latent Binary Encoding in Deep Neural Network Classifiers. (arXiv:2310.08224v1 [cs.LG])
    We observe the emergence of binary encoding within the latent space of deep-neural-network classifiers. Such binary encoding is induced by introducing a linear penultimate layer, which is equipped during training with a loss function that grows as $\exp(\vec{x}^2)$, where $\vec{x}$ are the coordinates in the latent space. The phenomenon we describe represents a specific instance of a well-documented occurrence known as \textit{neural collapse}, which arises in the terminal phase of training and entails the collapse of latent class means to the vertices of a simplex equiangular tight frame (ETF). We show that binary encoding accelerates convergence toward the simplex ETF and enhances classification accuracy.
    On Training Derivative-Constrained Neural Networks. (arXiv:2310.01649v2 [cs.LG] UPDATED)
    We refer to the setting where the (partial) derivatives of a neural network's (NN's) predictions with respect to its inputs are used as additional training signal as a derivative-constrained (DC) NN. This situation is common in physics-informed settings in the natural sciences. We propose an integrated RELU (IReLU) activation function to improve training of DC NNs. We also investigate denormalization and label rescaling to help stabilize DC training. We evaluate our methods on physics-informed settings including quantum chemistry and Scientific Machine Learning (SciML) tasks. We demonstrate that existing architectures with IReLU activations combined with denormalization and label rescaling better incorporate training signal provided by derivative constraints.
    Generalization bounds for neural ordinary differential equations and deep residual networks. (arXiv:2305.06648v2 [stat.ML] UPDATED)
    Neural ordinary differential equations (neural ODEs) are a popular family of continuous-depth deep learning models. In this work, we consider a large family of parameterized ODEs with continuous-in-time parameters, which include time-dependent neural ODEs. We derive a generalization bound for this class by a Lipschitz-based argument. By leveraging the analogy between neural ODEs and deep residual networks, our approach yields in particular a generalization bound for a class of deep residual networks. The bound involves the magnitude of the difference between successive weight matrices. We illustrate numerically how this quantity affects the generalization capability of neural networks.  ( 2 min )
    Asynchronous Evolution of Deep Neural Network Architectures. (arXiv:2308.04102v2 [cs.NE] UPDATED)
    Many evolutionary algorithms (EAs) take advantage of parallel evaluation of candidates. However, if evaluation times vary significantly, many worker nodes (i.e.,\ compute clients) are idle much of the time, waiting for the next generation to be created. Evolutionary neural architecture search (ENAS), a class of EAs that optimizes the architecture and hyperparameters of deep neural networks, is particularly vulnerable to this issue. This paper proposes a generic asynchronous evaluation strategy (AES) that is then adapted to work with ENAS. AES increases throughput by maintaining a queue of up to $K$ individuals ready to be sent to the workers for evaluation and proceeding to the next generation as soon as $M<<K$ individuals have been evaluated. A suitable value for $M$ is determined experimentally, balancing diversity and efficiency. To showcase the generality and power of AES, it was first evaluated in eight-line sorting network design (a single-population optimization task with limited evaluation-time variability), achieving an over two-fold speedup. Next, it was evaluated in 11-bit multiplexer design (a single-population discovery task with extended variability), where a 14-fold speedup was observed. It was then scaled up to ENAS for image captioning (a multi-population open-ended-optimization task), resulting in an over two-fold speedup. In all problems, a multifold performance improvement was observed, suggesting that AES is a promising method for parallelizing the evolution of complex systems with long and variable evaluation times, such as those in ENAS.
    Measuring Feature Sparsity in Language Models. (arXiv:2310.07837v1 [cs.LG])
    Recent works have proposed that activations in language models can be modelled as sparse linear combinations of vectors corresponding to features of input text. Under this assumption, these works aimed to reconstruct feature directions using sparse coding. We develop metrics to assess the success of these sparse coding techniques and test the validity of the linearity and sparsity assumptions. We show our metrics can predict the level of sparsity on synthetic sparse linear activations, and can distinguish between sparse linear data and several other distributions. We use our metrics to measure levels of sparsity in several language models. We find evidence that language model activations can be accurately modelled by sparse linear combinations of features, significantly more so than control datasets. We also show that model activations appear to be sparsest in the first and final layers.
    DePT: Decomposed Prompt Tuning for Parameter-Efficient Fine-tuning. (arXiv:2309.05173v2 [cs.CL] UPDATED)
    Prompt tuning (PT), where a small amount of trainable soft (continuous) prompt vectors is affixed to the input of language models (LM), has shown promising results across various tasks and models for parameter-efficient fine-tuning (PEFT). PT stands out from other PEFT approaches because it maintains competitive performance with fewer trainable parameters and does not drastically scale up its parameters as the model size expands. However, PT introduces additional soft prompt tokens, leading to longer input sequences, which significantly impacts training and inference time and memory usage due to the Transformer's quadratic complexity. Particularly concerning for Large Language Models (LLMs) that face heavy daily querying. To address this issue, we propose Decomposed Prompt Tuning (DePT), which decomposes the soft prompt into a shorter soft prompt and a pair of low-rank matrices that are then optimised with two different learning rates. This allows DePT to achieve better performance while saving over 20% memory and time costs compared to vanilla PT and its variants, without changing trainable parameter sizes. Through extensive experiments on 23 natural language processing (NLP) and vision-language (VL) tasks, we demonstrate that DePT outperforms state-of-the-art PEFT approaches, including the full fine-tuning baseline in some scenarios. Additionally, we empirically show that DEPT grows more efficient as the model size increases. Our further study reveals that DePT integrates seamlessly with parameter-efficient transfer learning in the few-shot learning setting and highlights its adaptability to various model architectures and sizes.
    Memorization with neural nets: going beyond the worst case. (arXiv:2310.00327v2 [stat.ML] UPDATED)
    In practice, deep neural networks are often able to easily interpolate their training data. To understand this phenomenon, many works have aimed to quantify the memorization capacity of a neural network architecture: the largest number of points such that the architecture can interpolate any placement of these points with any assignment of labels. For real-world data, however, one intuitively expects the presence of a benign structure so that interpolation already occurs at a smaller network size than suggested by memorization capacity. In this paper, we investigate interpolation by adopting an instance-specific viewpoint. We introduce a simple randomized algorithm that, given a fixed finite dataset with two classes, with high probability constructs an interpolating three-layer neural network in polynomial time. The required number of parameters is linked to geometric properties of the two classes and their mutual arrangement. As a result, we obtain guarantees that are independent of the number of samples and hence move beyond worst-case memorization capacity bounds. We illustrate the effectiveness of the algorithm in non-pathological situations with extensive numerical experiments and link the insights back to the theoretical results.
    A Theoretical Explanation of Activation Sparsity through Flat Minima and Adversarial Robustness. (arXiv:2309.03004v2 [cs.LG] UPDATED)
    A recent empirical observation (Li et al., 2022b) of activation sparsity in MLP blocks offers an opportunity to drastically reduce computation costs for free. Although having attributed it to training dynamics, existing theoretical explanations of activation sparsity are restricted to shallow networks, small training steps and special training, despite its emergence in deep models standardly trained for a large number of steps. To fill these gaps, we propose the notion of gradient sparsity as one source of activation sparsity and a theoretical explanation based on it that sees sparsity a necessary step to adversarial robustness w.r.t. hidden features and parameters, which is approximately the flatness of minima for well-learned models. The theory applies to standardly trained LayerNorm-ed MLPs, and further to Transformers or other architectures trained with weight noises. Eliminating other sources of flatness except for sparsity, we discover the phenomenon that the ratio between the largest and smallest non-zero singular values of weight matrices is small. When discussing the emergence of this spectral concentration, we use random matrix theory (RMT) as a powerful tool to analyze stochastic gradient noises. Validational experiments are conducted to verify our gradient-sparsity-based explanation. We propose two plug-and-play modules for both training and finetuning for sparsity. Experiments on ImageNet-1k and C4 demonstrate their 50% sparsity improvements, indicating further potential cost reduction in both training and inference.
    Semantic-Forward Relaying: A Novel Framework Towards 6G Cooperative Communications. (arXiv:2310.07987v1 [cs.NI])
    This letter proposes a novel relaying framework, semantic-forward (SF), for cooperative communications towards the sixth-generation (6G) wireless networks. The SF relay extracts and transmits the semantic features, which reduces forwarding payload, and also improves the network robustness against intra-link errors. Based on the theoretical basis for cooperative communications with side information and the turbo principle, we design a joint source-channel coding algorithm to iteratively exchange the extrinsic information for enhancing the decoding gains at the destination. Surprisingly, simulation results indicate that even in bad channel conditions, SF relaying can still effectively improve the recovered information quality.
    On the Security Vulnerabilities of Text-to-SQL Models. (arXiv:2211.15363v3 [cs.CL] UPDATED)
    Although it has been demonstrated that Natural Language Processing (NLP) algorithms are vulnerable to deliberate attacks, the question of whether such weaknesses can lead to software security threats is under-explored. To bridge this gap, we conducted vulnerability tests on Text-to-SQL systems that are commonly used to create natural language interfaces to databases. We showed that the Text-to-SQL modules within six commercial applications can be manipulated to produce malicious code, potentially leading to data breaches and Denial of Service attacks. This is the first demonstration that NLP models can be exploited as attack vectors in the wild. In addition, experiments using four open-source language models verified that straightforward backdoor attacks on Text-to-SQL systems achieve a 100% success rate without affecting their performance. The aim of this work is to draw the community's attention to potential software security issues associated with NLP algorithms and encourage exploration of methods to mitigate against them.
    Learning to Generate Novel Scientific Directions with Contextualized Literature-based Discovery. (arXiv:2305.14259v3 [cs.CL] UPDATED)
    Literature-Based Discovery (LBD) aims to discover new scientific knowledge by mining papers and generating hypotheses. Standard LBD is limited to predicting pairwise relations between discrete concepts (e.g., drug-disease links), and ignores critical contexts like experimental settings (e.g., a specific patient population where a drug is evaluated) and background motivations (e.g., to find drugs without specific side effects). We address these limitations with a novel formulation of contextualized-LBD (C-LBD): generating scientific hypotheses in natural language, while grounding them in a context that controls the hypothesis search space. We present a modeling framework using retrieval of ``inspirations'' from past scientific papers. Our evaluations reveal that GPT-4 tends to generate ideas with overall low technical depth and novelty, while our inspiration prompting approaches partially mitigate this issue. Our work represents a first step toward building language models that generate new ideas derived from scientific literature.  ( 2 min )
    Reset It and Forget It: Relearning Last-Layer Weights Improves Continual and Transfer Learning. (arXiv:2310.07996v1 [cs.LG])
    This work identifies a simple pre-training mechanism that leads to representations exhibiting better continual and transfer learning. This mechanism -- the repeated resetting of weights in the last layer, which we nickname "zapping" -- was originally designed for a meta-continual-learning procedure, yet we show it is surprisingly applicable in many settings beyond both meta-learning and continual learning. In our experiments, we wish to transfer a pre-trained image classifier to a new set of classes, in a few shots. We show that our zapping procedure results in improved transfer accuracy and/or more rapid adaptation in both standard fine-tuning and continual learning settings, while being simple to implement and computationally efficient. In many cases, we achieve performance on par with state of the art meta-learning without needing the expensive higher-order gradients, by using a combination of zapping and sequential learning. An intuitive explanation for the effectiveness of this zapping procedure is that representations trained with repeated zapping learn features that are capable of rapidly adapting to newly initialized classifiers. Such an approach may be considered a computationally cheaper type of, or alternative to, meta-learning rapidly adaptable features with higher-order gradients. This adds to recent work on the usefulness of resetting neural network parameters during training, and invites further investigation of this mechanism.
    Finite Scalar Quantization: VQ-VAE Made Simple. (arXiv:2309.15505v2 [cs.CV] UPDATED)
    We propose to replace vector quantization (VQ) in the latent representation of VQ-VAEs with a simple scheme termed finite scalar quantization (FSQ), where we project the VAE representation down to a few dimensions (typically less than 10). Each dimension is quantized to a small set of fixed values, leading to an (implicit) codebook given by the product of these sets. By appropriately choosing the number of dimensions and values each dimension can take, we obtain the same codebook size as in VQ. On top of such discrete representations, we can train the same models that have been trained on VQ-VAE representations. For example, autoregressive and masked transformer models for image generation, multimodal generation, and dense prediction computer vision tasks. Concretely, we employ FSQ with MaskGIT for image generation, and with UViM for depth estimation, colorization, and panoptic segmentation. Despite the much simpler design of FSQ, we obtain competitive performance in all these tasks. We emphasize that FSQ does not suffer from codebook collapse and does not need the complex machinery employed in VQ (commitment losses, codebook reseeding, code splitting, entropy penalties, etc.) to learn expressive discrete representations.
    Neural Diffusion Models. (arXiv:2310.08337v1 [cs.LG])
    Diffusion models have shown remarkable performance on many generative tasks. Despite recent success, most diffusion models are restricted in that they only allow linear transformation of the data distribution. In contrast, broader family of transformations can potentially help train generative distributions more efficiently, simplifying the reverse process and closing the gap between the true negative log-likelihood and the variational approximation. In this paper, we present Neural Diffusion Models (NDMs), a generalization of conventional diffusion models that enables defining and learning time-dependent non-linear transformations of data. We show how to optimise NDMs using a variational bound in a simulation-free setting. Moreover, we derive a time-continuous formulation of NDMs, which allows fast and reliable inference using off-the-shelf numerical ODE and SDE solvers. Finally, we demonstrate the utility of NDMs with learnable transformations through experiments on standard image generation benchmarks, including CIFAR-10, downsampled versions of ImageNet and CelebA-HQ. NDMs outperform conventional diffusion models in terms of likelihood and produce high-quality samples.
    Pure Monte Carlo Counterfactual Regret Minimization. (arXiv:2309.03084v2 [cs.AI] UPDATED)
    Counterfactual Regret Minimization (CFR) and its variants are the best algorithms so far for solving large-scale incomplete information games. However, we believe that there are two problems with CFR: First, matrix multiplication is required in CFR iteration, and the time complexity of one iteration is too high; Secondly, the game characteristics in the real world are different. Just using one CFR algorithm will not be perfectly suitable for all game problems. For these two problems, this paper proposes a new algorithm called Pure CFR (PCFR) based on CFR. PCFR can be seen as a combination of CFR and Fictitious Play (FP), inheriting the concept of counterfactual regret (value) from CFR, and using the best response strategy instead of the regret matching strategy for the next iteration. This algorithm has three advantages. First, PCFR can be combined with any CFR variant. The resulting Pure MCCFR (PMCCFR) can significantly reduce the time and space complexity of one iteration. Secondly, our experiments show that the convergence speed of the PMCCFR is 2$\sim$3 times that of the MCCFR. Finally, there is a type of game that is very suitable for PCFR, we call this type of game clear-game, which is characterized by a high proportion of dominated strategies. Experiments show that in clear-game, the convergence rate of PMCCFR is two orders of magnitude higher than that of MCCFR.
    A Carbon Tracking Model for Federated Learning: Impact of Quantization and Sparsification. (arXiv:2310.08087v1 [eess.SP])
    Federated Learning (FL) methods adopt efficient communication technologies to distribute machine learning tasks across edge devices, reducing the overhead in terms of data storage and computational complexity compared to centralized solutions. Rather than moving large data volumes from producers (sensors, machines) to energy-hungry data centers, raising environmental concerns due to resource demands, FL provides an alternative solution to mitigate the energy demands of several learning tasks while enabling new Artificial Intelligence of Things (AIoT) applications. This paper proposes a framework for real-time monitoring of the energy and carbon footprint impacts of FL systems. The carbon tracking tool is evaluated for consensus (fully decentralized) and classical FL policies. For the first time, we present a quantitative evaluation of different computationally and communication efficient FL methods from the perspectives of energy consumption and carbon equivalent emissions, suggesting also general guidelines for energy-efficient design. Results indicate that consensus-driven FL implementations should be preferred for limiting carbon emissions when the energy efficiency of the communication is low (i.e., < 25 Kbit/Joule). Besides, quantization and sparsification operations are shown to strike a balance between learning performances and energy consumption, leading to sustainable FL designs.
    Deep Reinforcement Learning for Autonomous Cyber Operations: A Survey. (arXiv:2310.07745v1 [cs.LG])
    The rapid increase in the number of cyber-attacks in recent years raises the need for principled methods for defending networks against malicious actors. Deep reinforcement learning (DRL) has emerged as a promising approach for mitigating these attacks. However, while DRL has shown much potential for cyber-defence, numerous challenges must be overcome before DRL can be applied to autonomous cyber-operations (ACO) at scale. Principled methods are required for environments that confront learners with very high-dimensional state spaces, large multi-discrete action spaces, and adversarial learning. Recent works have reported success in solving these problems individually. There have also been impressive engineering efforts towards solving all three for real-time strategy games. However, applying DRL to the full ACO problem remains an open challenge. Here, we survey the relevant DRL literature and conceptualize an idealised ACO-DRL agent. We provide: i.) A summary of the domain properties that define the ACO problem; ii.) A comprehensive evaluation of the extent to which domains used for benchmarking DRL approaches are comparable to ACO; iii.) An overview of state-of-the-art approaches for scaling DRL to domains that confront learners with the curse of dimensionality, and; iv.) A survey and critique of current methods for limiting the exploitability of agents within adversarial settings from the perspective of ACO. We conclude with open research questions that we hope will motivate future directions for researchers and practitioners working on ACO.
    Towards the Fundamental Limits of Knowledge Transfer over Finite Domains. (arXiv:2310.07838v1 [cs.LG])
    We characterize the statistical efficiency of knowledge transfer through $n$ samples from a teacher to a probabilistic student classifier with input space $\mathcal S$ over labels $\mathcal A$. We show that privileged information at three progressive levels accelerates the transfer. At the first level, only samples with hard labels are known, via which the maximum likelihood estimator attains the minimax rate $\sqrt{{|{\mathcal S}||{\mathcal A}|}/{n}}$. The second level has the teacher probabilities of sampled labels available in addition, which turns out to boost the convergence rate lower bound to ${{|{\mathcal S}||{\mathcal A}|}/{n}}$. However, under this second data acquisition protocol, minimizing a naive adaptation of the cross-entropy loss results in an asymptotically biased student. We overcome this limitation and achieve the fundamental limit by using a novel empirical variant of the squared error logit loss. The third level further equips the student with the soft labels (complete logits) on ${\mathcal A}$ given every sampled input, thereby provably enables the student to enjoy a rate ${|{\mathcal S}|}/{n}$ free of $|{\mathcal A}|$. We find any Kullback-Leibler divergence minimizer to be optimal in the last case. Numerical simulations distinguish the four learners and corroborate our theory.
    Discovering Hierarchical Achievements in Reinforcement Learning via Contrastive Learning. (arXiv:2307.03486v2 [cs.LG] UPDATED)
    Discovering achievements with a hierarchical structure in procedurally generated environments presents a significant challenge. This requires an agent to possess a broad range of abilities, including generalization and long-term reasoning. Many prior methods have been built upon model-based or hierarchical approaches, with the belief that an explicit module for long-term planning would be advantageous for learning hierarchical dependencies. However, these methods demand an excessive number of environment interactions or large model sizes, limiting their practicality. In this work, we demonstrate that proximal policy optimization (PPO), a simple yet versatile model-free algorithm, outperforms previous methods when optimized with recent implementation practices. Moreover, we find that the PPO agent can predict the next achievement to be unlocked to some extent, albeit with limited confidence. Based on this observation, we introduce a novel contrastive learning method, called achievement distillation, which strengthens the agent's ability to predict the next achievement. Our method exhibits a strong capacity for discovering hierarchical achievements and shows state-of-the-art performance on the challenging Crafter environment in a sample-efficient manner while utilizing fewer model parameters.
    Conformal inference for regression on Riemannian Manifolds. (arXiv:2310.08209v1 [stat.ML])
    Regression on manifolds, and, more broadly, statistics on manifolds, has garnered significant importance in recent years due to the vast number of applications for this type of data. Circular data is a classic example, but so is data in the space of covariance matrices, data on the Grassmannian manifold obtained as a result of principal component analysis, among many others. In this work we investigate prediction sets for regression scenarios when the response variable, denoted by $Y$, resides in a manifold, and the covariable, denoted by X, lies in Euclidean space. This extends the concepts delineated in [Lei and Wasserman, 2014] to this novel context. Aligning with traditional principles in conformal inference, these prediction sets are distribution-free, indicating that no specific assumptions are imposed on the joint distribution of $(X, Y)$, and they maintain a non-parametric character. We prove the asymptotic almost sure convergence of the empirical version of these regions on the manifold to their population counterparts. The efficiency of this method is shown through a comprehensive simulation study and an analysis involving real-world data.
    Beyond Uniform Sampling: Offline Reinforcement Learning with Imbalanced Datasets. (arXiv:2310.04413v2 [cs.LG] UPDATED)
    Offline policy learning is aimed at learning decision-making policies using existing datasets of trajectories without collecting additional data. The primary motivation for using reinforcement learning (RL) instead of supervised learning techniques such as behavior cloning is to find a policy that achieves a higher average return than the trajectories constituting the dataset. However, we empirically find that when a dataset is dominated by suboptimal trajectories, state-of-the-art offline RL algorithms do not substantially improve over the average return of trajectories in the dataset. We argue this is due to an assumption made by current offline RL algorithms of staying close to the trajectories in the dataset. If the dataset primarily consists of sub-optimal trajectories, this assumption forces the policy to mimic the suboptimal actions. We overcome this issue by proposing a sampling strategy that enables the policy to only be constrained to ``good data" rather than all actions in the dataset (i.e., uniform sampling). We present a realization of the sampling strategy and an algorithm that can be used as a plug-and-play module in standard offline RL algorithms. Our evaluation demonstrates significant performance gains in 72 imbalanced datasets, D4RL dataset, and across three different offline RL algorithms. Code is available at https://github.com/Improbable-AI/dw-offline-rl.
    Provably Efficient Offline Goal-Conditioned Reinforcement Learning with General Function Approximation and Single-Policy Concentrability. (arXiv:2302.03770v2 [cs.LG] UPDATED)
    Goal-conditioned reinforcement learning (GCRL) refers to learning general-purpose skills that aim to reach diverse goals. In particular, offline GCRL only requires purely pre-collected datasets to perform training tasks without additional interactions with the environment. Although offline GCRL has become increasingly prevalent and many previous works have demonstrated its empirical success, the theoretical understanding of efficient offline GCRL algorithms is not well established, especially when the state space is huge and the offline dataset only covers the policy we aim to learn. In this paper, we provide a rigorous theoretical analysis of an existing empirically successful offline GCRL algorithm. We prove that under slight modification, this algorithm enjoys an $\widetilde{O}(\text{poly}(1/\epsilon))$ sample complexity (where $\epsilon$ is the desired suboptimality of the learned policy) with general function approximation thanks to the property of (semi-)strong convexity of the objective functions. We only require nearly minimal assumptions on the dataset (single-policy concentrability) and the function class (realizability). Moreover, this algorithm consists of two uninterleaved optimization steps, which we refer to as $V$-learning and policy learning, and is computationally stable since it does not involve minimax optimization. We also empirically validate our theory by showing that the modified algorithm outperforms the previous algorithm in various real-world environments. To the best of our knowledge, this is the first algorithm that is both provably efficient with general function approximation and single-policy concentrability, and empirically successful without requiring solving minimax optimization problems.  ( 3 min )
    Analyzing And Editing Inner Mechanisms Of Backdoored Language Models. (arXiv:2302.12461v2 [cs.LG] UPDATED)
    Poisoning of data sets is a potential security threat to large language models that can lead to backdoored models. A description of the internal mechanisms of backdoored language models and how they process trigger inputs, e.g., when switching to toxic language, has yet to be found. In this work, we study the internal representations of transformer-based backdoored language models and determine early-layer MLP modules as most important for the backdoor mechanism in combination with the initial embedding projection. We use this knowledge to remove, insert, and modify backdoor mechanisms with engineered replacements that reduce the MLP module outputs to essentials for the backdoor mechanism. To this end, we introduce PCP ablation, where we replace transformer modules with low-rank matrices based on the principal components of their activations. We demonstrate our results on backdoored toy, backdoored large, and non-backdoored open-source models. We show that we can improve the backdoor robustness of large language models by locally constraining individual modules during fine-tuning on potentially poisonous data sets. Trigger warning: Offensive language.
    Quantum-Enhanced Forecasting: Leveraging Quantum Gramian Angular Field and CNNs for Stock Return Predictions. (arXiv:2310.07427v2 [cs.LG] UPDATED)
    We propose a time series forecasting method named Quantum Gramian Angular Field (QGAF). This approach merges the advantages of quantum computing technology with deep learning, aiming to enhance the precision of time series classification and forecasting. We successfully transformed stock return time series data into two-dimensional images suitable for Convolutional Neural Network (CNN) training by designing specific quantum circuits. Distinct from the classical Gramian Angular Field (GAF) approach, QGAF's uniqueness lies in eliminating the need for data normalization and inverse cosine calculations, simplifying the transformation process from time series data to two-dimensional images. To validate the effectiveness of this method, we conducted experiments on datasets from three major stock markets: the China A-share market, the Hong Kong stock market, and the US stock market. Experimental results revealed that compared to the classical GAF method, the QGAF approach significantly improved time series prediction accuracy, reducing prediction errors by an average of 25% for Mean Absolute Error (MAE) and 48% for Mean Squared Error (MSE). This research confirms the potential and promising prospects of integrating quantum computing with deep learning techniques in financial time series forecasting.
    Contextualized Policy Recovery: Modeling and Interpreting Medical Decisions with Adaptive Imitation Learning. (arXiv:2310.07918v1 [cs.LG])
    Interpretable policy learning seeks to estimate intelligible decision policies from observed actions; however, existing models fall short by forcing a tradeoff between accuracy and interpretability. This tradeoff limits data-driven interpretations of human decision-making process. e.g. to audit medical decisions for biases and suboptimal practices, we require models of decision processes which provide concise descriptions of complex behaviors. Fundamentally, existing approaches are burdened by this tradeoff because they represent the underlying decision process as a universal policy, when in fact human decisions are dynamic and can change drastically with contextual information. Thus, we propose Contextualized Policy Recovery (CPR), which re-frames the problem of modeling complex decision processes as a multi-task learning problem in which complex decision policies are comprised of context-specific policies. CPR models each context-specific policy as a linear observation-to-action mapping, and generates new decision models $\textit{on-demand}$ as contexts are updated with new observations. CPR is compatible with fully offline and partially observable decision environments, and can be tailored to incorporate any recurrent black-box model or interpretable decision model. We assess CPR through studies on simulated and real data, achieving state-of-the-art performance on the canonical tasks of predicting antibiotic prescription in intensive care units ($+22\%$ AUROC vs. previous SOTA) and predicting MRI prescription for Alzheimer's patients ($+7.7\%$ AUROC vs. previous SOTA). With this improvement in predictive performance, CPR closes the accuracy gap between interpretable and black-box methods for policy learning, allowing high-resolution exploration and analysis of context-specific decision models.
    NuTime: Numerically Multi-Scaled Embedding for Large-Scale Time Series Pretraining. (arXiv:2310.07402v2 [cs.LG] UPDATED)
    Recent research on time-series self-supervised models shows great promise in learning semantic representations. However, it has been limited to small-scale datasets, e.g., thousands of temporal sequences. In this work, we make key technical contributions that are tailored to the numerical properties of time-series data and allow the model to scale to large datasets, e.g., millions of temporal sequences. We adopt the Transformer architecture by first partitioning the input into non-overlapping windows. Each window is then characterized by its normalized shape and two scalar values denoting the mean and standard deviation within each window. To embed scalar values that may possess arbitrary numerical scales to high-dimensional vectors, we propose a numerically multi-scaled embedding module enumerating all possible scales for the scalar values. The model undergoes pretraining using the proposed numerically multi-scaled embedding with a simple contrastive objective on a large-scale dataset containing over a million sequences. We study its transfer performance on a number of univariate and multivariate classification benchmarks. Our method exhibits remarkable improvement against previous representation learning approaches and establishes the new state of the art, even compared with domain-specific non-learning-based methods.
    MMTSA: Multimodal Temporal Segment Attention Network for Efficient Human Activity Recognition. (arXiv:2210.09222v2 [cs.CV] UPDATED)
    Multimodal sensors provide complementary information to develop accurate machine-learning methods for human activity recognition (HAR), but introduce significantly higher computational load, which reduces efficiency. This paper proposes an efficient multimodal neural architecture for HAR using an RGB camera and inertial measurement units (IMUs) called Multimodal Temporal Segment Attention Network (MMTSA). MMTSA first transforms IMU sensor data into a temporal and structure-preserving gray-scale image using the Gramian Angular Field (GAF), representing the inherent properties of human activities. MMTSA then applies a multimodal sparse sampling method to reduce data redundancy. Lastly, MMTSA adopts an inter-segment attention module for efficient multimodal fusion. Using three well-established public datasets, we evaluated MMTSA's effectiveness and efficiency in HAR. Results show that our method achieves superior performance improvements 11.13% of cross-subject F1-score on the MMAct dataset than the previous state-of-the-art (SOTA) methods. The ablation study and analysis suggest that MMTSA's effectiveness in fusing multimodal data for accurate HAR. The efficiency evaluation on an edge device showed that MMTSA achieved significantly better accuracy, lower computational load, and lower inference latency than SOTA methods.  ( 2 min )
    A Comprehensive Review on Tree Detection Methods Using Point Cloud and Aerial Imagery from Unmanned Aerial Vehicles. (arXiv:2309.16375v2 [cs.CV] CROSS LISTED)
    Unmanned Aerial Vehicles (UAVs) are considered cutting-edge technology with highly cost-effective and flexible usage scenarios. Although many papers have reviewed the application of UAVs in agriculture, the review of the application for tree detection is still insufficient. This paper focuses on tree detection methods applied to UAV data collected by UAVs. There are two kinds of data, the point cloud and the images, which are acquired by the Light Detection and Ranging (LiDAR) sensor and camera, respectively. Among the detection methods using point-cloud data, this paper mainly classifies these methods according to LiDAR and Digital Aerial Photography (DAP). For the detection methods using images directly, this paper reviews these methods by whether or not to use the Deep Learning (DL) method. Our review concludes and analyses the comparison and combination between the application of LiDAR-based and DAP-based point cloud data. The performance, relative merits, and application fields of the methods are also introduced. Meanwhile, this review counts the number of tree detection studies using different methods in recent years. From our statics, the detection task using DL methods on the image has become a mainstream trend as the number of DL-based detection researches increases to 45% of the total number of tree detection studies up to 2022. As a result, this review could help and guide researchers who want to carry out tree detection on specific forests and for farmers to use UAVs in managing agriculture production.
    Core-sets for Fair and Diverse Data Summarization. (arXiv:2310.08122v1 [cs.DS])
    We study core-set construction algorithms for the task of Diversity Maximization under fairness/partition constraint. Given a set of points $P$ in a metric space partitioned into $m$ groups, and given $k_1,\ldots,k_m$, the goal of this problem is to pick $k_i$ points from each group $i$ such that the overall diversity of the $k=\sum_i k_i$ picked points is maximized. We consider two natural diversity measures: sum-of-pairwise distances and sum-of-nearest-neighbor distances, and show improved core-set construction algorithms with respect to these measures. More precisely, we show the first constant factor core-set w.r.t. sum-of-pairwise distances whose size is independent of the size of the dataset and the aspect ratio. Second, we show the first core-set w.r.t. the sum-of-nearest-neighbor distances. Finally, we run several experiments showing the effectiveness of our core-set approach. In particular, we apply constrained diversity maximization to summarize a set of timed messages that takes into account the messages' recency. Specifically, the summary should include more recent messages compared to older ones. This is a real task in one of the largest communication platforms, affecting the experience of hundreds of millions daily active users. By utilizing our core-set method for this task, we achieve a 100x speed-up while losing the diversity by only a few percent. Moreover, our approach allows us to improve the space usage of the algorithm in the streaming setting.
    MetaBox: A Benchmark Platform for Meta-Black-Box Optimization with Reinforcement Learning. (arXiv:2310.08252v1 [cs.LG])
    Recently, Meta-Black-Box Optimization with Reinforcement Learning (MetaBBO-RL) has showcased the power of leveraging RL at the meta-level to mitigate manual fine-tuning of low-level black-box optimizers. However, this field is hindered by the lack of a unified benchmark. To fill this gap, we introduce MetaBox, the first benchmark platform expressly tailored for developing and evaluating MetaBBO-RL methods. MetaBox offers a flexible algorithmic template that allows users to effortlessly implement their unique designs within the platform. Moreover, it provides a broad spectrum of over 300 problem instances, collected from synthetic to realistic scenarios, and an extensive library of 19 baseline methods, including both traditional black-box optimizers and recent MetaBBO-RL methods. Besides, MetaBox introduces three standardized performance metrics, enabling a more thorough assessment of the methods. In a bid to illustrate the utility of MetaBox for facilitating rigorous evaluation and in-depth analysis, we carry out a wide-ranging benchmarking study on existing MetaBBO-RL methods. Our MetaBox is open-source and accessible at: https://github.com/GMC-DRL/MetaBox.
    Generative modeling of time-dependent densities via optimal transport and projection pursuit. (arXiv:2304.09663v2 [stat.ML] UPDATED)
    Motivated by the computational difficulties incurred by popular deep learning algorithms for the generative modeling of temporal densities, we propose a cheap alternative which requires minimal hyperparameter tuning and scales favorably to high dimensional problems. In particular, we use a projection-based optimal transport solver [Meng et al., 2019] to join successive samples and subsequently use transport splines [Chewi et al., 2020] to interpolate the evolving density. When the sampling frequency is sufficiently high, the optimal maps are close to the identity and are thus computationally efficient to compute. Moreover, the training process is highly parallelizable as all optimal maps are independent and can thus be learned simultaneously. Finally, the approach is based solely on numerical linear algebra rather than minimizing a nonconvex objective function, allowing us to easily analyze and control the algorithm. We present several numerical experiments on both synthetic and real-world datasets to demonstrate the efficiency of our method. In particular, these experiments show that the proposed approach is highly competitive compared with state-of-the-art normalizing flows conditioned on time across a wide range of dimensionalities.
    An interpretable neural network-based non-proportional odds model for ordinal regression. (arXiv:2303.17823v3 [stat.ME] UPDATED)
    This study proposes an interpretable neural network-based non-proportional odds model (N$^3$POM) for ordinal regression. N$^3$POM is different from conventional approaches to ordinal regression with non-proportional models in several ways: (1) N$^3$POM is designed to directly handle continuous responses, whereas standard methods typically treat de facto ordered continuous variables as discrete, (2) instead of estimating response-dependent finite coefficients of linear models from discrete responses as is done in conventional approaches, we train a non-linear neural network to serve as a coefficient function. Thanks to the neural network, N$^3$POM offers flexibility while preserving the interpretability of conventional ordinal regression. We establish a sufficient condition under which the predicted conditional cumulative probability locally satisfies the monotonicity constraint over a user-specified region in the covariate space. Additionally, we provide a monotonicity-preserving stochastic (MPS) algorithm for effectively training the neural network. We apply N$^3$POM to several real-world datasets.
    Bitrate-Constrained DRO: Beyond Worst Case Robustness To Unknown Group Shifts. (arXiv:2302.02931v2 [cs.LG] UPDATED)
    Training machine learning models robust to distribution shifts is critical for real-world applications. Some robust training algorithms (e.g., Group DRO) specialize to group shifts and require group information on all training points. Other methods (e.g., CVaR DRO) that do not need group annotations can be overly conservative, since they naively upweight high loss points which may form a contrived set that does not correspond to any meaningful group in the real world (e.g., when the high loss points are randomly mislabeled training points). In this work, we address limitations in prior approaches by assuming a more nuanced form of group shift: conditioned on the label, we assume that the true group function (indicator over group) is simple. For example, we may expect that group shifts occur along low bitrate features (e.g., image background, lighting). Thus, we aim to learn a model that maintains high accuracy on simple group functions realized by these low bitrate features, that need not spend valuable model capacity achieving high accuracy on contrived groups of examples. Based on this, we consider the two-player game formulation of DRO where the adversary's capacity is bitrate-constrained. Our resulting practical algorithm, Bitrate-Constrained DRO (BR-DRO), does not require group information on training samples yet matches the performance of Group DRO on datasets that have training group annotations and that of CVaR DRO on long-tailed distributions. Our theoretical analysis reveals that in some settings BR-DRO objective can provably yield statistically efficient and less conservative solutions than unconstrained CVaR DRO.
    Explainable Attention for Few-shot Learning and Beyond. (arXiv:2310.07800v1 [cs.AI])
    Attention mechanisms have exhibited promising potential in enhancing learning models by identifying salient portions of input data. This is particularly valuable in scenarios where limited training samples are accessible due to challenges in data collection and labeling. Drawing inspiration from human recognition processes, we posit that an AI baseline's performance could be more accurate and dependable if it is exposed to essential segments of raw data rather than the entire input dataset, akin to human perception. However, the task of selecting these informative data segments, referred to as hard attention finding, presents a formidable challenge. In situations with few training samples, existing studies struggle to locate such informative regions due to the large number of training parameters that cannot be effectively learned from the available limited samples. In this study, we introduce a novel and practical framework for achieving explainable hard attention finding, specifically tailored for few-shot learning scenarios, called FewXAT. Our approach employs deep reinforcement learning to implement the concept of hard attention, directly impacting raw input data and thus rendering the process interpretable for human understanding. Through extensive experimentation across various benchmark datasets, we demonstrate the efficacy of our proposed method.
    Reinforcement Learning of Display Transfer Robots in Glass Flow Control Systems: A Physical Simulation-Based Approach. (arXiv:2310.07981v1 [cs.LG])
    A flow control system is a critical concept for increasing the production capacity of manufacturing systems. To solve the scheduling optimization problem related to the flow control with the aim of improving productivity, existing methods depend on a heuristic design by domain human experts. Therefore, the methods require correction, monitoring, and verification by using real equipment. As system designs increase in complexity, the monitoring time increases, which decreases the probability of arriving at the optimal design. As an alternative approach to the heuristic design of flow control systems, the use of deep reinforcement learning to solve the scheduling optimization problem has been considered. Although the existing research on reinforcement learning has yielded excellent performance in some areas, the applicability of the results to actual FAB such as display and semiconductor manufacturing processes is not evident so far. To this end, we propose a method to implement a physical simulation environment and devise a feasible flow control system design using a transfer robot in display manufacturing through reinforcement learning. We present a model and parameter setting to build a virtual environment for different display transfer robots, and training methods of reinforcement learning on the environment to obtain an optimal scheduling of glass flow control systems. Its feasibility was verified by using different types of robots used in the actual process.
    Identifying latent distances with Finslerian geometry. (arXiv:2212.10010v2 [cs.LG] UPDATED)
    Riemannian geometry provides us with powerful tools to explore the latent space of generative models while preserving the underlying structure of the data. The latent space can be equipped it with a Riemannian metric, pulled back from the data manifold. With this metric, we can systematically navigate the space relying on geodesics defined as the shortest curves between two points. Generative models are often stochastic, causing the data space, the Riemannian metric, and the geodesics, to be stochastic as well. Stochastic objects are at best impractical, and at worst impossible, to manipulate. A common solution is to approximate the stochastic pullback metric by its expectation. But the geodesics derived from this expected Riemannian metric do not correspond to the expected length-minimising curves. In this work, we propose another metric whose geodesics explicitly minimise the expected length of the pullback metric. We show this metric defines a Finsler metric, and we compare it with the expected Riemannian metric. In high dimensions, we prove that both metrics converge to each other at a rate of $O\left(\frac{1}{D}\right)$. This convergence implies that the established expected Riemannian metric is an accurate approximation of the theoretically more grounded Finsler metric. This provides justification for using the expected Riemannian metric for practical implementations.
    Theoretical Hardness and Tractability of POMDPs in RL with Partial Online State Information. (arXiv:2306.08762v2 [cs.LG] UPDATED)
    Partially observable Markov decision processes (POMDPs) have been widely applied to capture many real-world applications. However, existing theoretical results have shown that learning in general POMDPs could be intractable, where the main challenge lies in the lack of latent state information. A key fundamental question here is how much online state information (OSI) is sufficient to achieve tractability. In this paper, we establish a lower bound that reveals a surprising hardness result: unless we have full OSI, we need an exponentially scaling sample complexity to obtain an $\epsilon$-optimal policy solution for POMDPs. Nonetheless, inspired by the key insights in our lower bound design, we find that there exist important tractable classes of POMDPs even with only partial OSI. In particular, for two novel classes of POMDPs with partial OSI, we provide new algorithms that are proved to be near-optimal by establishing new regret upper and lower bounds.
    Infinite Width Graph Neural Networks for Node Regression/ Classification. (arXiv:2310.08176v1 [cs.LG])
    This work analyzes Graph Neural Networks, a generalization of Fully-Connected Deep Neural Nets on Graph structured data, when their width, that is the number of nodes in each fullyconnected layer is increasing to infinity. Infinite Width Neural Networks are connecting Deep Learning to Gaussian Processes and Kernels, both Machine Learning Frameworks with long traditions and extensive theoretical foundations. Gaussian Processes and Kernels have much less hyperparameters then Neural Networks and can be used for uncertainty estimation, making them more user friendly for applications. This works extends the increasing amount of research connecting Gaussian Processes and Kernels to Neural Networks. The Kernel and Gaussian Process closed forms are derived for a variety of architectures, namely the standard Graph Neural Network, the Graph Neural Network with Skip-Concatenate Connections and the Graph Attention Neural Network. All architectures are evaluated on a variety of datasets on the task of transductive Node Regression and Classification. Additionally, a Spectral Sparsification method known as Effective Resistance is used to improve runtime and memory requirements. Extending the setting to inductive graph learning tasks (Graph Regression/ Classification) is straightforward and is briefly discussed in 3.5.
    Tight Time-Space Lower Bounds for Constant-Pass Learning. (arXiv:2310.08070v1 [cs.LG])
    In his breakthrough paper, Raz showed that any parity learning algorithm requires either quadratic memory or an exponential number of samples [FOCS'16, JACM'19]. A line of work that followed extended this result to a large class of learning problems. Until recently, all these results considered learning in the streaming model, where each sample is drawn independently, and the learner is allowed a single pass over the stream of samples. Garg, Raz, and Tal [CCC'19] considered a stronger model, allowing multiple passes over the stream. In the $2$-pass model, they showed that learning parities of size $n$ requires either a memory of size $n^{1.5}$ or at least $2^{\sqrt{n}}$ samples. (Their result also generalizes to other learning problems.) In this work, for any constant $q$, we prove tight memory-sample lower bounds for any parity learning algorithm that makes $q$ passes over the stream of samples. We show that such a learner requires either $\Omega(n^{2})$ memory size or at least $2^{\Omega(n)}$ samples. Beyond establishing a tight lower bound, this is the first non-trivial lower bound for $q$-pass learning for any $q\ge 3$. Similar to prior work, our results extend to any learning problem with many nearly-orthogonal concepts. We complement the lower bound with an upper bound, showing that parity learning with $q$ passes can be done efficiently with $O(n^2/\log q)$ memory.
    AutoFHE: Automated Adaption of CNNs for Efficient Evaluation over FHE. (arXiv:2310.08012v1 [cs.LG])
    Secure inference of deep convolutional neural networks (CNNs) under RNS-CKKS involves polynomial approximation of unsupported non-linear activation functions. However, existing approaches have three main limitations: 1) Inflexibility: The polynomial approximation and associated homomorphic evaluation architecture are customized manually for each CNN architecture and do not generalize to other networks. 2) Suboptimal Approximation: Each activation function is approximated instead of the function represented by the CNN. 3) Restricted Design: Either high-degree or low-degree polynomial approximations are used. The former retains high accuracy but slows down inference due to bootstrapping operations, while the latter accelerates ciphertext inference but compromises accuracy. To address these limitations, we present AutoFHE, which automatically adapts standard CNNs for secure inference under RNS-CKKS. The key idea is to adopt layerwise mixed-degree polynomial activation functions, which are optimized jointly with the homomorphic evaluation architecture in terms of the placement of bootstrapping operations. The problem is modeled within a multi-objective optimization framework to maximize accuracy and minimize the number of bootstrapping operations. AutoFHE can be applied flexibly on any CNN architecture, and it provides diverse solutions that span the trade-off between accuracy and latency. Experimental evaluation over RNS-CKKS encrypted CIFAR datasets shows that AutoFHE accelerates secure inference by $1.32\times$ to $1.8\times$ compared to methods employing high-degree polynomials. It also improves accuracy by up to 2.56% compared to methods using low-degree polynomials. Lastly, AutoFHE accelerates inference and improves accuracy by $103\times$ and 3.46%, respectively, compared to CNNs under TFHE.
    Impact of multi-armed bandit strategies on deep recurrent reinforcement learning. (arXiv:2310.08331v1 [stat.ML])
    Incomplete knowledge of the environment leads an agent to make decisions under uncertainty. One of the major dilemmas in Reinforcement Learning (RL) where an autonomous agent has to balance two contrasting needs in making its decisions is: exploiting the current knowledge of the environment to maximize the cumulative reward as well as exploring actions that allow improving the knowledge of the environment, hopefully leading to higher reward values (exploration-exploitation trade-off). Concurrently, another relevant issue regards the full observability of the states, which may not be assumed in all applications. Such as when only 2D images are considered as input in a RL approach used for finding the optimal action within a 3D simulation environment. In this work, we address these issues by deploying and testing several techniques to balance exploration and exploitation trade-off on partially observable systems for predicting steering wheels in autonomous driving scenario. More precisely, the final aim is to investigate the effects of using both stochastic and deterministic multi-armed bandit strategies coupled with a Deep Recurrent Q-Network. Additionally, we adapted and evaluated the impact of an innovative method to improve the learning phase of the underlying Convolutional Recurrent Neural Network. We aim to show that adaptive stochastic methods for exploration better approximate the trade-off between exploration and exploitation as, in general, Softmax and Max-Boltzmann strategies are able to outperform epsilon-greedy techniques.
    Generative Intrinsic Optimization: Intrisic Control with Model Learning. (arXiv:2310.08100v1 [cs.LG])
    Future sequence represents the outcome after executing the action into the environment. When driven by the information-theoretic concept of mutual information, it seeks maximally informative consequences. Explicit outcomes may vary across state, return, or trajectory serving different purposes such as credit assignment or imitation learning. However, the inherent nature of incorporating intrinsic motivation with reward maximization is often neglected. In this work, we propose a variational approach to jointly learn the necessary quantity for estimating the mutual information and the dynamics model, providing a general framework for incorporating different forms of outcomes of interest. Integrated into a policy iteration scheme, our approach guarantees convergence to the optimal policy. While we mainly focus on theoretical analysis, our approach opens the possibilities of leveraging intrinsic control with model learning to enhance sample efficiency and incorporate uncertainty of the environment into decision-making.
    Efficient Integrators for Diffusion Generative Models. (arXiv:2310.07894v1 [cs.LG])
    Diffusion models suffer from slow sample generation at inference time. Therefore, developing a principled framework for fast deterministic/stochastic sampling for a broader class of diffusion models is a promising direction. We propose two complementary frameworks for accelerating sample generation in pre-trained models: Conjugate Integrators and Splitting Integrators. Conjugate integrators generalize DDIM, mapping the reverse diffusion dynamics to a more amenable space for sampling. In contrast, splitting-based integrators, commonly used in molecular dynamics, reduce the numerical simulation error by cleverly alternating between numerical updates involving the data and auxiliary variables. After extensively studying these methods empirically and theoretically, we present a hybrid method that leads to the best-reported performance for diffusion models in augmented spaces. Applied to Phase Space Langevin Diffusion [Pandey & Mandt, 2023] on CIFAR-10, our deterministic and stochastic samplers achieve FID scores of 2.11 and 2.36 in only 100 network function evaluations (NFE) as compared to 2.57 and 2.63 for the best-performing baselines, respectively. Our code and model checkpoints will be made publicly available at \url{https://github.com/mandt-lab/PSLD}.
    The Thousand Faces of Explainable AI Along the Machine Learning Life Cycle: Industrial Reality and Current State of Research. (arXiv:2310.07882v1 [cs.LG])
    In this paper, we investigate the practical relevance of explainable artificial intelligence (XAI) with a special focus on the producing industries and relate them to the current state of academic XAI research. Our findings are based on an extensive series of interviews regarding the role and applicability of XAI along the Machine Learning (ML) lifecycle in current industrial practice and its expected relevance in the future. The interviews were conducted among a great variety of roles and key stakeholders from different industry sectors. On top of that, we outline the state of XAI research by providing a concise review of the relevant literature. This enables us to provide an encompassing overview covering the opinions of the surveyed persons as well as the current state of academic research. By comparing our interview results with the current research approaches we reveal several discrepancies. While a multitude of different XAI approaches exists, most of them are centered around the model evaluation phase and data scientists. Their versatile capabilities for other stages are currently either not sufficiently explored or not popular among practitioners. In line with existing work, our findings also confirm that more efforts are needed to enable also non-expert users' interpretation and understanding of opaque AI models with existing methods and frameworks.
    Seeing-Eye Quadruped Navigation with Force Responsive Locomotion Control. (arXiv:2309.04370v2 [cs.RO] UPDATED)
    Seeing-eye robots are very useful tools for guiding visually impaired people, potentially producing a huge societal impact given the low availability and high cost of real guide dogs. Although a few seeing-eye robot systems have already been demonstrated, none considered external tugs from humans, which frequently occur in a real guide dog setting. In this paper, we simultaneously train a locomotion controller that is robust to external tugging forces via Reinforcement Learning (RL), and an external force estimator via supervised learning. The controller ensures stable walking, and the force estimator enables the robot to respond to the external forces from the human. These forces are used to guide the robot to the global goal, which is unknown to the robot, while the robot guides the human around nearby obstacles via a local planner. Experimental results in simulation and on hardware show that our controller is robust to external forces, and our seeing-eye system can accurately detect force direction. We demonstrate our full seeing-eye robot system on a real quadruped robot with a blindfolded human. The video can be seen at our project page: https://bu-air-lab.github.io/guide_dog/
    LLM4TS: Two-Stage Fine-Tuning for Time-Series Forecasting with Pre-Trained LLMs. (arXiv:2308.08469v3 [cs.LG] UPDATED)
    In this work, we leverage pre-trained Large Language Models (LLMs) to enhance time-series forecasting. Mirroring the growing interest in unifying models for Natural Language Processing and Computer Vision, we envision creating an analogous model for long-term time-series forecasting. Due to limited large-scale time-series data for building robust foundation models, our approach LLM4TS focuses on leveraging the strengths of pre-trained LLMs. By combining time-series patching with temporal encoding, we have enhanced the capability of LLMs to handle time-series data effectively. Inspired by the supervised fine-tuning in chatbot domains, we prioritize a two-stage fine-tuning process: first conducting supervised fine-tuning to orient the LLM towards time-series data, followed by task-specific downstream fine-tuning. Furthermore, to unlock the flexibility of pre-trained LLMs without extensive parameter adjustments, we adopt several Parameter-Efficient Fine-Tuning (PEFT) techniques. Drawing on these innovations, LLM4TS has yielded state-of-the-art results in long-term forecasting. Our model has also shown exceptional capabilities as both a robust representation learner and an effective few-shot learner, thanks to the knowledge transferred from the pre-trained LLM.
    Multi-Objective Optimization for Sparse Deep Neural Network Training. (arXiv:2308.12243v2 [cs.LG] UPDATED)
    Different conflicting optimization criteria arise naturally in various Deep Learning scenarios. These can address different main tasks (i.e., in the setting of Multi-Task Learning), but also main and secondary tasks such as loss minimization versus sparsity. The usual approach is a simple weighting of the criteria, which formally only works in the convex setting. In this paper, we present a Multi-Objective Optimization algorithm using a modified Weighted Chebyshev scalarization for training Deep Neural Networks (DNNs) with respect to several tasks. By employing this scalarization technique, the algorithm can identify all optimal solutions of the original problem while reducing its complexity to a sequence of single-objective problems. The simplified problems are then solved using an Augmented Lagrangian method, enabling the use of popular optimization techniques such as Adam and Stochastic Gradient Descent, while efficaciously handling constraints. Our work aims to address the (economical and also ecological) sustainability issue of DNN models, with a particular focus on Deep Multi-Task models, which are typically designed with a very large number of weights to perform equally well on multiple tasks. Through experiments conducted on two Machine Learning datasets, we demonstrate the possibility of adaptively sparsifying the model during training without significantly impacting its performance, if we are willing to apply task-specific adaptations to the network weights. Code is available at https://github.com/salomonhotegni/MDMTN.
    Continual Learning via Manifold Expansion Replay. (arXiv:2310.08038v1 [cs.LG])
    In continual learning, the learner learns multiple tasks in sequence, with data being acquired only once for each task. Catastrophic forgetting is a major challenge to continual learning. To reduce forgetting, some existing rehearsal-based methods use episodic memory to replay samples of previous tasks. However, in the process of knowledge integration when learning a new task, this strategy also suffers from catastrophic forgetting due to an imbalance between old and new knowledge. To address this problem, we propose a novel replay strategy called Manifold Expansion Replay (MaER). We argue that expanding the implicit manifold of the knowledge representation in the episodic memory helps to improve the robustness and expressiveness of the model. To this end, we propose a greedy strategy to keep increasing the diameter of the implicit manifold represented by the knowledge in the buffer during memory management. In addition, we introduce Wasserstein distance instead of cross entropy as distillation loss to preserve previous knowledge. With extensive experimental validation on MNIST, CIFAR10, CIFAR100, and TinyImageNet, we show that the proposed method significantly improves the accuracy in continual learning setup, outperforming the state of the arts.
    MINT: Evaluating LLMs in Multi-turn Interaction with Tools and Language Feedback. (arXiv:2309.10691v2 [cs.CL] UPDATED)
    To solve complex tasks, large language models (LLMs) often require multiple rounds of interactions with the user, sometimes assisted by external tools. However, current evaluation protocols often emphasize benchmark performance with single-turn exchanges, neglecting the nuanced interactions among the user, LLMs, and external tools, while also underestimating the importance of natural language feedback from users. These oversights contribute to discrepancies between research benchmark evaluations and real-world use cases. We introduce MINT, a benchmark that evaluates LLMs' ability to solve tasks with multi-turn interactions by (1) using tools and (2) leveraging natural language feedback. To ensure reproducibility, we provide an evaluation framework where LLMs can access tools by executing Python code and receive users' natural language feedback simulated by GPT-4. We repurpose a diverse set of established evaluation datasets focusing on reasoning, coding, and decision-making and carefully curate them into a compact subset for efficient evaluation. Our analysis of 20 open- and closed-source LLMs offers intriguing findings. (a) LLMs generally benefit from tools and language feedback, with performance gains (absolute, same below) of 1-8% for each turn of tool use and 2-17% with natural language feedback. (b) Better single-turn performance does not guarantee better multi-turn performance. (c) Surprisingly, on the LLMs evaluated, supervised instruction-finetuning (SIFT) and reinforcement learning from human feedback (RLHF) generally hurt multi-turn capabilities. We expect MINT can help measure progress and incentivize research in improving LLMs' capabilities in multi-turn interactions, especially for open-source communities where multi-turn human evaluation can be less accessible compared to commercial LLMs with a larger user base.
    Explorable Mesh Deformation Subspaces from Unstructured Generative Models. (arXiv:2310.07814v1 [cs.GR])
    Exploring variations of 3D shapes is a time-consuming process in traditional 3D modeling tools. Deep generative models of 3D shapes often feature continuous latent spaces that can, in principle, be used to explore potential variations starting from a set of input shapes. In practice, doing so can be problematic: latent spaces are high dimensional and hard to visualize, contain shapes that are not relevant to the input shapes, and linear paths through them often lead to sub-optimal shape transitions. Furthermore, one would ideally be able to explore variations in the original high-quality meshes used to train the generative model, not its lower-quality output geometry. In this paper, we present a method to explore variations among a given set of landmark shapes by constructing a mapping from an easily-navigable 2D exploration space to a subspace of a pre-trained generative model. We first describe how to find a mapping that spans the set of input landmark shapes and exhibits smooth variations between them. We then show how to turn the variations in this subspace into deformation fields, to transfer those variations to high-quality meshes for the landmark shapes. Our results show that our method can produce visually-pleasing and easily-navigable 2D exploration spaces for several different shape categories, especially as compared to prior work on learning deformation spaces for 3D shapes.
    Accountability in Offline Reinforcement Learning: Explaining Decisions with a Corpus of Examples. (arXiv:2310.07747v1 [cs.LG])
    Learning transparent, interpretable controllers with offline data in decision-making systems is an essential area of research due to its potential to reduce the risk of applications in real-world systems. However, in responsibility-sensitive settings such as healthcare, decision accountability is of paramount importance, yet has not been adequately addressed by the literature. This paper introduces the Accountable Offline Controller (AOC) that employs the offline dataset as the Decision Corpus and performs accountable control based on a tailored selection of examples, referred to as the Corpus Subset. ABC operates effectively in low-data scenarios, can be extended to the strictly offline imitation setting, and displays qualities of both conservation and adaptability. We assess ABC's performance in both simulated and real-world healthcare scenarios, emphasizing its capability to manage offline control tasks with high levels of performance while maintaining accountability. Keywords: Interpretable Reinforcement Learning, Explainable Reinforcement Learning, Reinforcement Learning Transparency, Offline Reinforcement Learning, Batched Control.
    A Complete Recipe for Diffusion Generative Models. (arXiv:2303.01748v2 [cs.LG] UPDATED)
    Score-based Generative Models (SGMs) have demonstrated exceptional synthesis outcomes across various tasks. However, the current design landscape of the forward diffusion process remains largely untapped and often relies on physical heuristics or simplifying assumptions. Utilizing insights from the development of scalable Bayesian posterior samplers, we present a complete recipe for formulating forward processes in SGMs, ensuring convergence to the desired target distribution. Our approach reveals that several existing SGMs can be seen as specific manifestations of our framework. Building upon this method, we introduce Phase Space Langevin Diffusion (PSLD), which relies on score-based modeling within an augmented space enriched by auxiliary variables akin to physical phase space. Empirical results exhibit the superior sample quality and improved speed-quality trade-off of PSLD compared to various competing approaches on established image synthesis benchmarks. Remarkably, PSLD achieves sample quality akin to state-of-the-art SGMs (FID: 2.10 for unconditional CIFAR-10 generation). Lastly, we demonstrate the applicability of PSLD in conditional synthesis using pre-trained score networks, offering an appealing alternative as an SGM backbone for future advancements. Code and model checkpoints can be accessed at \url{https://github.com/mandt-lab/PSLD}.
    Interpreting Reward Models in RLHF-Tuned Language Models Using Sparse Autoencoders. (arXiv:2310.08164v1 [cs.LG])
    Large language models (LLMs) aligned to human preferences via reinforcement learning from human feedback (RLHF) underpin many commercial applications. However, how RLHF impacts LLM internals remains opaque. We propose a novel method to interpret learned reward functions in RLHF-tuned LLMs using sparse autoencoders. Our approach trains autoencoder sets on activations from a base LLM and its RLHF-tuned version. By comparing autoencoder hidden spaces, we identify unique features that reflect the accuracy of the learned reward model. To quantify this, we construct a scenario where the tuned LLM learns token-reward mappings to maximize reward. This is the first application of sparse autoencoders for interpreting learned rewards and broadly inspecting reward learning in LLMs. Our method provides an abstract approximation of reward integrity. This presents a promising technique for ensuring alignment between specified objectives and model behaviors.
    Score Regularized Policy Optimization through Diffusion Behavior. (arXiv:2310.07297v2 [cs.LG] UPDATED)
    Recent developments in offline reinforcement learning have uncovered the immense potential of diffusion modeling, which excels at representing heterogeneous behavior policies. However, sampling from diffusion policies is considerably slow because it necessitates tens to hundreds of iterative inference steps for one action. To address this issue, we propose to extract an efficient deterministic inference policy from critic models and pretrained diffusion behavior models, leveraging the latter to directly regularize the policy gradient with the behavior distribution's score function during optimization. Our method enjoys powerful generative capabilities of diffusion modeling while completely circumventing the computationally intensive and time-consuming diffusion sampling scheme, both during training and evaluation. Extensive results on D4RL tasks show that our method boosts action sampling speed by more than 25 times compared with various leading diffusion-based methods in locomotion tasks, while still maintaining state-of-the-art performance.
    Only Pay for What Is Uncertain: Variance-Adaptive Thompson Sampling. (arXiv:2303.09033v2 [cs.LG] UPDATED)
    Most bandit algorithms assume that the reward variances or their upper bounds are known, and that they are the same for all arms. This naturally leads to suboptimal performance and higher regret due to variance overestimation. On the other hand, underestimated reward variances may lead to linear regret due to committing early to a suboptimal arm. This motivated prior works on variance-adaptive frequentist algorithms, which have strong instance-dependent regret bounds but cannot incorporate prior knowledge on reward variances. We lay foundations for the Bayesian setting, which incorporates prior knowledge. This results in lower regret in practice, due to using the prior in the algorithm design, and also improved regret guarantees. Specifically, we study Gaussian bandits with {unknown heterogeneous reward variances}, and develop a Thompson sampling algorithm with prior-dependent Bayes regret bounds. We achieve lower regret with lower reward variances and more informative priors on them, which is precisely why we pay only for what is uncertain. This is the first result of its kind. Finally, we corroborate our theory with extensive experiments, which show the superiority of our variance-adaptive Bayesian algorithm over prior frequentist approaches. We also show that our approach is robust to model misspecification and can be applied with estimated priors.
    Extreme Image Transformations Facilitate Robust Latent Object Representations. (arXiv:2310.07725v1 [cs.LG])
    Adversarial attacks can affect the object recognition capabilities of machines in wild. These can often result from spurious correlations between input and class labels, and are prone to memorization in large networks. While networks are expected to do automated feature selection, it is not effective at the scale of the object. Humans, however, are able to select the minimum set of features required to form a robust representation of an object. In this work, we show that finetuning any pretrained off-the-shelf network with Extreme Image Transformations (EIT) not only helps in learning a robust latent representation, it also improves the performance of these networks against common adversarial attacks of various intensities. Our EIT trained networks show strong activations in the object regions even when tested with more intense noise, showing promising generalizations across different kinds of adversarial attacks.
    Physics Constrained Unsupervised Deep Learning for Rapid, High Resolution Scanning Coherent Diffraction Reconstruction. (arXiv:2306.11014v2 [physics.comp-ph] UPDATED)
    By circumventing the resolution limitations of optics, coherent diffractive imaging (CDI) and ptychography are making their way into scientific fields ranging from X-ray imaging to astronomy. Yet, the need for time consuming iterative phase recovery hampers real-time imaging. While supervised deep learning strategies have increased reconstruction speed, they sacrifice image quality. Furthermore, these methods' demand for extensive labeled training data is experimentally burdensome. Here, we propose an unsupervised physics-informed neural network reconstruction method, PtychoPINN, that retains the factor of 100-to-1000 speedup of deep learning-based reconstruction while improving reconstruction quality by combining the diffraction forward map with real-space constraints from overlapping measurements. In particular, PtychoPINN significantly advances generalizability, accuracy (with a typical 10 dB PSNR increase), and linear resolution (2- to 6-fold gain). This blend of performance and speed offers exciting prospects for high-resolution real-time imaging in high-throughput environments such as X-ray free electron lasers (XFELs) and diffraction-limited light sources.
    In-Context Unlearning: Language Models as Few Shot Unlearners. (arXiv:2310.07579v2 [cs.LG] UPDATED)
    Machine unlearning, the study of efficiently removing the impact of specific training points on the trained model, has garnered increased attention of late, driven by the need to comply with privacy regulations like the Right to be Forgotten. Although unlearning is particularly relevant for LLMs in light of the copyright issues they raise, achieving precise unlearning is computationally infeasible for very large models. To this end, recent work has proposed several algorithms which approximate the removal of training data without retraining the model. These algorithms crucially rely on access to the model parameters in order to update them, an assumption that may not hold in practice due to computational constraints or when the LLM is accessed via API. In this work, we propose a new class of unlearning methods for LLMs we call ''In-Context Unlearning'', providing inputs in context and without having to update model parameters. To unlearn a particular training instance, we provide the instance alongside a flipped label and additional correctly labelled instances which are prepended as inputs to the LLM at inference time. Our experimental results demonstrate that these contexts effectively remove specific information from the training set while maintaining performance levels that are competitive with (or in some cases exceed) state-of-the-art unlearning methods that require access to the LLM parameters.
    Lifelong Audio-video Masked Autoencoder with Forget-robust Localized Alignments. (arXiv:2310.08204v1 [cs.CV])
    We present a lifelong audio-video masked autoencoder that continually learns the multimodal representations from a video stream containing audio-video pairs, while its distribution continually shifts over time. Specifically, we propose two novel ideas to tackle the problem: (1) Localized Alignment: We introduce a small trainable multimodal encoder that predicts the audio and video tokens that are well-aligned with each other. This allows the model to learn only the highly correlated audiovisual patches with accurate multimodal relationships. (2) Forget-robust multimodal patch selection: We compare the relative importance of each audio-video patch between the current and past data pair to mitigate unintended drift of the previously learned audio-video representations. Our proposed method, FLAVA (Forget-robust Localized Audio-Video Alignment), therefore, captures the complex relationships between the audio and video modalities during training on a sequence of pre-training tasks while alleviating the forgetting of learned audiovisual correlations. Our experiments validate that FLAVA outperforms the state-of-the-art continual learning methods on several benchmark datasets under continual audio-video representation learning scenarios.
    Impact of Co-occurrence on Factual Knowledge of Large Language Models. (arXiv:2310.08256v1 [cs.CL])
    Large language models (LLMs) often make factually incorrect responses despite their success in various applications. In this paper, we hypothesize that relying heavily on simple co-occurrence statistics of the pre-training corpora is one of the main factors that cause factual errors. Our results reveal that LLMs are vulnerable to the co-occurrence bias, defined as preferring frequently co-occurred words over the correct answer. Consequently, LLMs struggle to recall facts whose subject and object rarely co-occur in the pre-training dataset although they are seen during finetuning. We show that co-occurrence bias remains despite scaling up model sizes or finetuning. Therefore, we suggest finetuning on a debiased dataset to mitigate the bias by filtering out biased samples whose subject-object co-occurrence count is high. Although debiased finetuning allows LLMs to memorize rare facts in the training set, it is not effective in recalling rare facts unseen during finetuning. Further research in mitigation will help build reliable language models by preventing potential errors. The code is available at \url{https://github.com/CheongWoong/impact_of_cooccurrence}.
    Observatory: Characterizing Embeddings of Relational Tables. (arXiv:2310.07736v1 [cs.DB])
    Language models and specialized table embedding models have recently demonstrated strong performance on many tasks over tabular data. Researchers and practitioners are keen to leverage these models in many new application contexts; but limited understanding of the strengths and weaknesses of these models, and the table representations they generate, makes the process of finding a suitable model for a given task reliant on trial and error. There is an urgent need to gain a comprehensive understanding of these models to minimize inefficiency and failures in downstream usage. To address this need, we propose Observatory, a formal framework to systematically analyze embedding representations of relational tables. Motivated both by invariants of the relational data model and by statistical considerations regarding data distributions, we define eight primitive properties, and corresponding measures to quantitatively characterize table embeddings for these properties. Based on these properties, we define an extensible framework to evaluate language and table embedding models. We collect and synthesize a suite of datasets and use Observatory to analyze seven such models. Our analysis provides insights into the strengths and weaknesses of learned representations over tables. We find, for example, that some models are sensitive to table structure such as column order, that functional dependencies are rarely reflected in embeddings, and that specialized table embedding models have relatively lower sample fidelity. Such insights help researchers and practitioners better anticipate model behaviors and select appropriate models for their downstream tasks, while guiding researchers in the development of new models.
    Understanding Sparse Feature Updates in Deep Networks using Iterative Linearisation. (arXiv:2211.12345v4 [cs.LG] UPDATED)
    Larger and deeper networks generalise well despite their increased capacity to overfit. Understanding why this happens is theoretically and practically important. One recent approach looks at the infinitely wide limits of such networks and their corresponding kernels. However, these theoretical tools cannot fully explain finite networks as the empirical kernel changes significantly during gradient-descent-based training in contrast to infinite networks. In this work, we derive an iterative linearised training method as a novel empirical tool to further investigate this distinction, allowing us to control for sparse (i.e. infrequent) feature updates and quantify the frequency of feature learning needed to achieve comparable performance. We justify iterative linearisation as an interpolation between a finite analog of the infinite width regime, which does not learn features, and standard gradient descent training, which does. Informally, we also show that it is analogous to a damped version of the Gauss-Newton algorithm -- a second-order method. We show that in a variety of cases, iterative linearised training surprisingly performs on par with standard training, noting in particular how much less frequent feature learning is required to achieve comparable performance. We also show that feature learning is essential for good performance. Since such feature learning inevitably causes changes in the NTK kernel, we provide direct negative evidence for the NTK theory, which states the NTK kernel remains constant during training.
    Dealing with zero-inflated data: achieving SOTA with a two-fold machine learning approach. (arXiv:2310.08088v1 [cs.LG])
    In many cases, a machine learning model must learn to correctly predict a few data points with particular values of interest in a broader range of data where many target values are zero. Zero-inflated data can be found in diverse scenarios, such as lumpy and intermittent demands, power consumption for home appliances being turned on and off, impurities measurement in distillation processes, and even airport shuttle demand prediction. The presence of zeroes affects the models' learning and may result in poor performance. Furthermore, zeroes also distort the metrics used to compute the model's prediction quality. This paper showcases two real-world use cases (home appliances classification and airport shuttle demand prediction) where a hierarchical model applied in the context of zero-inflated data leads to excellent results. In particular, for home appliances classification, the weighted average of Precision, Recall, F1, and AUC ROC was increased by 27%, 34%, 49%, and 27%, respectively. Furthermore, it is estimated that the proposed approach is also four times more energy efficient than the SOTA approach against which it was compared to. Two-fold models performed best in all cases when predicting airport shuttle demand, and the difference against other models has been proven to be statistically significant.  ( 2 min )
    Invisible Threats: Backdoor Attack in OCR Systems. (arXiv:2310.08259v1 [cs.CR])
    Optical Character Recognition (OCR) is a widely used tool to extract text from scanned documents. Today, the state-of-the-art is achieved by exploiting deep neural networks. However, the cost of this performance is paid at the price of system vulnerability. For instance, in backdoor attacks, attackers compromise the training phase by inserting a backdoor in the victim's model that will be activated at testing time by specific patterns while leaving the overall model performance intact. This work proposes a backdoor attack for OCR resulting in the injection of non-readable characters from malicious input images. This simple but effective attack exposes the state-of-the-art OCR weakness, making the extracted text correct to human eyes but simultaneously unusable for the NLP application that uses OCR as a preprocessing step. Experimental results show that the attacked models successfully output non-readable characters for around 90% of the poisoned instances without harming their performance for the remaining instances.
    Adaptive Optimizers with Sparse Group Lasso for Neural Networks in CTR Prediction. (arXiv:2107.14432v4 [cs.LG] UPDATED)
    We develop a novel framework that adds the regularizers of the sparse group lasso to a family of adaptive optimizers in deep learning, such as Momentum, Adagrad, Adam, AMSGrad, AdaHessian, and create a new class of optimizers, which are named Group Momentum, Group Adagrad, Group Adam, Group AMSGrad and Group AdaHessian, etc., accordingly. We establish theoretically proven convergence guarantees in the stochastic convex settings, based on primal-dual methods. We evaluate the regularized effect of our new optimizers on three large-scale real-world ad click datasets with state-of-the-art deep learning models. The experimental results reveal that compared with the original optimizers with the post-processing procedure which uses the magnitude pruning method, the performance of the models can be significantly improved on the same sparsity level. Furthermore, in comparison to the cases without magnitude pruning, our methods can achieve extremely high sparsity with significantly better or highly competitive performance. The code is available at https://github.com/intelligent-machine-learning/dlrover/blob/master/tfplus.  ( 3 min )
    GIO: Gradient Information Optimization for Training Dataset Selection. (arXiv:2306.11670v2 [cs.LG] UPDATED)
    It is often advantageous to train models on a subset of the available train examples, because the examples are of variable quality or because one would like to train with fewer examples, without sacrificing performance. We present Gradient Information Optimization (GIO), a scalable, task-agnostic approach to this data selection problem that requires only a small set of (unlabeled) examples representing a target distribution. GIO begins from a natural, information-theoretic objective that is intractable in practice. Our contribution is in showing that it can be made highly scalable through a simple relaxation of the objective and a highly efficient implementation. In experiments with machine translation, spelling correction, and image recognition, we show that GIO delivers outstanding results with very small train sets. These findings are robust to different representation models and hyperparameters for GIO itself. GIO is task- and domain-agnostic and can be applied out-of-the-box to new datasets and domains.
    A Theory of Non-Linear Feature Learning with One Gradient Step in Two-Layer Neural Networks. (arXiv:2310.07891v1 [stat.ML])
    Feature learning is thought to be one of the fundamental reasons for the success of deep neural networks. It is rigorously known that in two-layer fully-connected neural networks under certain conditions, one step of gradient descent on the first layer followed by ridge regression on the second layer can lead to feature learning; characterized by the appearance of a separated rank-one component -- spike -- in the spectrum of the feature matrix. However, with a constant gradient descent step size, this spike only carries information from the linear component of the target function and therefore learning non-linear components is impossible. We show that with a learning rate that grows with the sample size, such training in fact introduces multiple rank-one components, each corresponding to a specific polynomial feature. We further prove that the limiting large-dimensional and large sample training and test errors of the updated neural networks are fully characterized by these spikes. By precisely analyzing the improvement in the loss, we demonstrate that these non-linear features can enhance learning.
    QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Models. (arXiv:2310.08041v1 [cs.CL])
    Large Language Models (LLMs) excel in NLP, but their demands hinder their widespread deployment. While Quantization-Aware Training (QAT) offers a solution, its extensive training costs make Post-Training Quantization (PTQ) a more practical approach for LLMs. In existing studies, activation outliers in particular channels are identified as the bottleneck to PTQ accuracy. They propose to transform the magnitudes from activations to weights, which however offers limited alleviation or suffers from unstable gradients, resulting in a severe performance drop at low-bitwidth. In this paper, we propose QLLM, an accurate and efficient low-bitwidth PTQ method designed for LLMs. QLLM introduces an adaptive channel reassembly technique that reallocates the magnitude of outliers to other channels, thereby mitigating their impact on the quantization range. This is achieved by channel disassembly and channel assembly, which first breaks down the outlier channels into several sub-channels to ensure a more balanced distribution of activation magnitudes. Then similar channels are merged to maintain the original channel number for efficiency. Additionally, an adaptive strategy is designed to autonomously determine the optimal number of sub-channels for channel disassembly. To further compensate for the performance loss caused by quantization, we propose an efficient tuning method that only learns a small number of low-rank weights while freezing the pre-trained quantized model. After training, these low-rank parameters can be fused into the frozen weights without affecting inference. Extensive experiments on LLaMA-1 and LLaMA-2 show that QLLM can obtain accurate quantized models efficiently. For example, QLLM quantizes the 4-bit LLaMA-2-70B within 10 hours on a single A100-80G GPU, outperforming the previous state-of-the-art method by 7.89% on the average accuracy across five zero-shot tasks.  ( 3 min )
    TriRE: A Multi-Mechanism Learning Paradigm for Continual Knowledge Retention and Promotion. (arXiv:2310.08217v1 [cs.AI])
    Continual learning (CL) has remained a persistent challenge for deep neural networks due to catastrophic forgetting (CF) of previously learned tasks. Several techniques such as weight regularization, experience rehearsal, and parameter isolation have been proposed to alleviate CF. Despite their relative success, these research directions have predominantly remained orthogonal and suffer from several shortcomings, while missing out on the advantages of competing strategies. On the contrary, the brain continually learns, accommodates, and transfers knowledge across tasks by simultaneously leveraging several neurophysiological processes, including neurogenesis, active forgetting, neuromodulation, metaplasticity, experience rehearsal, and context-dependent gating, rarely resulting in CF. Inspired by how the brain exploits multiple mechanisms concurrently, we propose TriRE, a novel CL paradigm that encompasses retaining the most prominent neurons for each task, revising and solidifying the extracted knowledge of current and past tasks, and actively promoting less active neurons for subsequent tasks through rewinding and relearning. Across CL settings, TriRE significantly reduces task interference and surpasses different CL approaches considered in isolation.  ( 2 min )
    Improving Fast Minimum-Norm Attacks with Hyperparameter Optimization. (arXiv:2310.08177v1 [cs.LG])
    Evaluating the adversarial robustness of machine learning models using gradient-based attacks is challenging. In this work, we show that hyperparameter optimization can improve fast minimum-norm attacks by automating the selection of the loss function, the optimizer and the step-size scheduler, along with the corresponding hyperparameters. Our extensive evaluation involving several robust models demonstrates the improved efficacy of fast minimum-norm attacks when hyper-up with hyperparameter optimization. We release our open-source code at https://github.com/pralab/HO-FMN.  ( 2 min )
    Data-Centric Learning from Unlabeled Graphs with Diffusion Model. (arXiv:2303.10108v2 [cs.LG] UPDATED)
    Graph property prediction tasks are important and numerous. While each task offers a small size of labeled examples, unlabeled graphs have been collected from various sources and at a large scale. A conventional approach is training a model with the unlabeled graphs on self-supervised tasks and then fine-tuning the model on the prediction tasks. However, the self-supervised task knowledge could not be aligned or sometimes conflicted with what the predictions needed. In this paper, we propose to extract the knowledge underlying the large set of unlabeled graphs as a specific set of useful data points to augment each property prediction model. We use a diffusion model to fully utilize the unlabeled graphs and design two new objectives to guide the model's denoising process with each task's labeled data to generate task-specific graph examples and their labels. Experiments demonstrate that our data-centric approach performs significantly better than fifteen existing various methods on fifteen tasks. The performance improvement brought by unlabeled data is visible as the generated labeled examples unlike the self-supervised learning.  ( 2 min )
    L2P: Learning to Place for Estimating Heavy-Tailed Distributed Outcomes. (arXiv:1908.04628v3 [cs.LG] UPDATED)
    Many real-world prediction tasks have outcome variables that have characteristic heavy-tail distributions. Examples include copies of books sold, auction prices of art pieces, demand for commodities in warehouses, etc. By learning heavy-tailed distributions, "big and rare" instances (e.g., the best-sellers) will have accurate predictions. Most existing approaches are not dedicated to learning heavy-tailed distribution; thus, they heavily under-predict such instances. To tackle this problem, we introduce Learning to Place (L2P), which exploits the pairwise relationships between instances for learning. In its training phase, L2P learns a pairwise preference classifier: is instance A > instance B? In its placing phase, L2P obtains a prediction by placing the new instance among the known instances. Based on its placement, the new instance is then assigned a value for its outcome variable. Experiments on real data show that L2P outperforms competing approaches in terms of accuracy and ability to reproduce heavy-tailed outcome distribution. In addition, L2P provides an interpretable model by placing each predicted instance in relation to its comparable neighbors. Interpretable models are highly desirable when lives and treasure are at stake.
    ETDock: A Novel Equivariant Transformer for Protein-Ligand Docking. (arXiv:2310.08061v1 [q-bio.BM])
    Predicting the docking between proteins and ligands is a crucial and challenging task for drug discovery. However, traditional docking methods mainly rely on scoring functions, and deep learning-based docking approaches usually neglect the 3D spatial information of proteins and ligands, as well as the graph-level features of ligands, which limits their performance. To address these limitations, we propose an equivariant transformer neural network for protein-ligand docking pose prediction. Our approach involves the fusion of ligand graph-level features by feature processing, followed by the learning of ligand and protein representations using our proposed TAMformer module. Additionally, we employ an iterative optimization approach based on the predicted distance matrix to generate refined ligand poses. The experimental results on real datasets show that our model can achieve state-of-the-art performance.  ( 2 min )
    Lag-Llama: Towards Foundation Models for Time Series Forecasting. (arXiv:2310.08278v1 [cs.LG])
    Aiming to build foundation models for time-series forecasting and study their scaling behavior, we present here our work-in-progress on Lag-Llama, a general-purpose univariate probabilistic time-series forecasting model trained on a large collection of time-series data. The model shows good zero-shot prediction capabilities on unseen "out-of-distribution" time-series datasets, outperforming supervised baselines. We use smoothly broken power-laws to fit and predict model scaling behavior. The open source code is made available at https://github.com/kashif/pytorch-transformer-ts.
    Rethinking Large-scale Pre-ranking System: Entire-chain Cross-domain Models. (arXiv:2310.08039v1 [cs.IR])
    Industrial systems such as recommender systems and online advertising, have been widely equipped with multi-stage architectures, which are divided into several cascaded modules, including matching, pre-ranking, ranking and re-ranking. As a critical bridge between matching and ranking, existing pre-ranking approaches mainly endure sample selection bias (SSB) problem owing to ignoring the entire-chain data dependence, resulting in sub-optimal performances. In this paper, we rethink pre-ranking system from the perspective of the entire sample space, and propose Entire-chain Cross-domain Models (ECM), which leverage samples from the whole cascaded stages to effectively alleviate SSB problem. Besides, we design a fine-grained neural structure named ECMM to further improve the pre-ranking accuracy. Specifically, we propose a cross-domain multi-tower neural network to comprehensively predict for each stage result, and introduce the sub-networking routing strategy with $L0$ regularization to reduce computational costs. Evaluations on real-world large-scale traffic logs demonstrate that our pre-ranking models outperform SOTA methods while time consumption is maintained within an acceptable level, which achieves better trade-off between efficiency and effectiveness.
    LightZero: A Unified Benchmark for Monte Carlo Tree Search in General Sequential Decision Scenarios. (arXiv:2310.08348v1 [cs.LG])
    Building agents based on tree-search planning capabilities with learned models has achieved remarkable success in classic decision-making problems, such as Go and Atari. However, it has been deemed challenging or even infeasible to extend Monte Carlo Tree Search (MCTS) based algorithms to diverse real-world applications, especially when these environments involve complex action spaces and significant simulation costs, or inherent stochasticity. In this work, we introduce LightZero, the first unified benchmark for deploying MCTS/MuZero in general sequential decision scenarios. Specificially, we summarize the most critical challenges in designing a general MCTS-style decision-making solver, then decompose the tightly-coupled algorithm and system design of tree-search RL methods into distinct sub-modules. By incorporating more appropriate exploration and optimization strategies, we can significantly enhance these sub-modules and construct powerful LightZero agents to tackle tasks across a wide range of domains, such as board games, Atari, MuJoCo, MiniGrid and GoBigger. Detailed benchmark results reveal the significant potential of such methods in building scalable and efficient decision intelligence. The code is available as part of OpenDILab at https://github.com/opendilab/LightZero.  ( 2 min )
    Why Train More? Effective and Efficient Membership Inference via Memorization. (arXiv:2310.08015v1 [cs.LG])
    Membership Inference Attacks (MIAs) aim to identify specific data samples within the private training dataset of machine learning models, leading to serious privacy violations and other sophisticated threats. Many practical black-box MIAs require query access to the data distribution (the same distribution where the private data is drawn) to train shadow models. By doing so, the adversary obtains models trained "with" or "without" samples drawn from the distribution, and analyzes the characteristics of the samples under consideration. The adversary is often required to train more than hundreds of shadow models to extract the signals needed for MIAs; this becomes the computational overhead of MIAs. In this paper, we propose that by strategically choosing the samples, MI adversaries can maximize their attack success while minimizing the number of shadow models. First, our motivational experiments suggest memorization as the key property explaining disparate sample vulnerability to MIAs. We formalize this through a theoretical bound that connects MI advantage with memorization. Second, we show sample complexity bounds that connect the number of shadow models needed for MIAs with memorization. Lastly, we confirm our theoretical arguments with comprehensive experiments; by utilizing samples with high memorization scores, the adversary can (a) significantly improve its efficacy regardless of the MIA used, and (b) reduce the number of shadow models by nearly two orders of magnitude compared to state-of-the-art approaches.  ( 2 min )
    NeRF2: Neural Radio-Frequency Radiance Fields. (arXiv:2305.06118v2 [cs.NI] UPDATED)
    Although Maxwell discovered the physical laws of electromagnetic waves 160 years ago, how to precisely model the propagation of an RF signal in an electrically large and complex environment remains a long-standing problem. The difficulty is in the complex interactions between the RF signal and the obstacles (e.g., reflection, diffraction, etc.). Inspired by the great success of using a neural network to describe the optical field in computer vision, we propose a neural radio-frequency radiance field, NeRF$^\textbf{2}$, which represents a continuous volumetric scene function that makes sense of an RF signal's propagation. Particularly, after training with a few signal measurements, NeRF$^\textbf{2}$ can tell how/what signal is received at any position when it knows the position of a transmitter. As a physical-layer neural network, NeRF$^\textbf{2}$ can take advantage of the learned statistic model plus the physical model of ray tracing to generate a synthetic dataset that meets the training demands of application-layer artificial neural networks (ANNs). Thus, we can boost the performance of ANNs by the proposed turbo-learning, which mixes the true and synthetic datasets to intensify the training. Our experiment results show that turbo-learning can enhance performance with an approximate 50% increase. We also demonstrate the power of NeRF$^\textbf{2}$ in the field of indoor localization and 5G MIMO.
    Quasi-Arithmetic Mixtures, Divergence Minimization, and Bregman Information. (arXiv:2209.07481v2 [cs.LG] UPDATED)
    Markov Chain Monte Carlo methods for sampling from complex distributions and estimating normalization constants often simulate samples from a sequence of intermediate distributions along an annealing path, which bridges between a tractable initial distribution and a target density of interest. Prior work has constructed annealing paths using quasi-arithmetic means, and interpreted the resulting intermediate densities as minimizing an expected divergence to the endpoints. We provide a comprehensive analysis of this 'centroid' property using Bregman divergences under a monotonic embedding of the density function, thereby associating common divergences such as Amari's and Renyi's ${\alpha}$-divergences, ${(\alpha,\beta)}$-divergences, and the Jensen-Shannon divergence with intermediate densities along an annealing path. Our analysis highlights the interplay between parametric families, quasi-arithmetic means, and divergence functions using the rho-tau Bregman divergence framework of Zhang 2004,2013.
    MemSAC: Memory Augmented Sample Consistency for Large Scale Unsupervised Domain Adaptation. (arXiv:2207.12389v2 [cs.CV] UPDATED)
    Practical real world datasets with plentiful categories introduce new challenges for unsupervised domain adaptation like small inter-class discriminability, that existing approaches relying on domain invariance alone cannot handle sufficiently well. In this work we propose MemSAC, which exploits sample level similarity across source and target domains to achieve discriminative transfer, along with architectures that scale to a large number of categories. For this purpose, we first introduce a memory augmented approach to efficiently extract pairwise similarity relations between labeled source and unlabeled target domain instances, suited to handle an arbitrary number of classes. Next, we propose and theoretically justify a novel variant of the contrastive loss to promote local consistency among within-class cross domain samples while enforcing separation between classes, thus preserving discriminative transfer from source to target. We validate the advantages of MemSAC with significant improvements over previous state-of-the-art on multiple challenging transfer tasks designed for large-scale adaptation, such as DomainNet with 345 classes and fine-grained adaptation on Caltech-UCSD birds dataset with 200 classes. We also provide in-depth analysis and insights into the effectiveness of MemSAC.
    Learning Joint Latent Space EBM Prior Model for Multi-layer Generator. (arXiv:2306.06323v2 [cs.CV] UPDATED)
    This paper studies the fundamental problem of learning multi-layer generator models. The multi-layer generator model builds multiple layers of latent variables as a prior model on top of the generator, which benefits learning complex data distribution and hierarchical representations. However, such a prior model usually focuses on modeling inter-layer relations between latent variables by assuming non-informative (conditional) Gaussian distributions, which can be limited in model expressivity. To tackle this issue and learn more expressive prior models, we propose an energy-based model (EBM) on the joint latent space over all layers of latent variables with the multi-layer generator as its backbone. Such joint latent space EBM prior model captures the intra-layer contextual relations at each layer through layer-wise energy terms, and latent variables across different layers are jointly corrected. We develop a joint training scheme via maximum likelihood estimation (MLE), which involves Markov Chain Monte Carlo (MCMC) sampling for both prior and posterior distributions of the latent variables from different layers. To ensure efficient inference and learning, we further propose a variational training scheme where an inference model is used to amortize the costly posterior MCMC sampling. Our experiments demonstrate that the learned model can be expressive in generating high-quality images and capturing hierarchical features for better outlier detection.  ( 2 min )
    Precise localization within the GI tract by combining classification of CNNs and time-series analysis of HMMs. (arXiv:2310.07895v1 [cs.LG])
    This paper presents a method to efficiently classify the gastroenterologic section of images derived from Video Capsule Endoscopy (VCE) studies by exploring the combination of a Convolutional Neural Network (CNN) for classification with the time-series analysis properties of a Hidden Markov Model (HMM). It is demonstrated that successive time-series analysis identifies and corrects errors in the CNN output. Our approach achieves an accuracy of $98.04\%$ on the Rhode Island (RI) Gastroenterology dataset. This allows for precise localization within the gastrointestinal (GI) tract while requiring only approximately 1M parameters and thus, provides a method suitable for low power devices  ( 2 min )
    Participatory Personalization in Classification. (arXiv:2302.03874v2 [cs.LG] UPDATED)
    Machine learning models are often personalized with information that is protected, sensitive, self-reported, or costly to acquire. These models use information about people but do not facilitate nor inform their consent. Individuals cannot opt out of reporting personal information to a model, nor tell if they benefit from personalization in the first place. We introduce a family of classification models, called participatory systems, that let individuals opt into personalization at prediction time. We present a model-agnostic algorithm to learn participatory systems for personalization with categorical group attributes. We conduct a comprehensive empirical study of participatory systems in clinical prediction tasks, benchmarking them with common approaches for personalization and imputation. Our results demonstrate that participatory systems can facilitate and inform consent while improving performance and data use across all groups who report personal data.
    Efficient Hyperdimensional Computing. (arXiv:2301.10902v2 [cs.LG] UPDATED)
    Hyperdimensional computing (HDC) is a method to perform classification that uses binary vectors with high dimensions and the majority rule. This approach has the potential to be energy-efficient and hence deemed suitable for resource-limited platforms due to its simplicity and massive parallelism. However, in order to achieve high accuracy, HDC sometimes uses hypervectors with tens of thousands of dimensions. This potentially negates its efficiency advantage. In this paper, we examine the necessity of such high dimensions and conduct a detailed theoretical analysis of the relationship between hypervector dimensions and accuracy. Our results demonstrate that as the dimension of the hypervectors increases, the worst-case/average-case HDC prediction accuracy with the majority rule decreases. Building on this insight, we develop HDC models that use binary hypervectors with dimensions orders of magnitude lower than those of state-of-the-art HDC models while maintaining equivalent or even improved accuracy and efficiency. For instance, on the MNIST dataset, we achieve 91.12% HDC accuracy in image classification with a dimension of only 64. Our methods perform operations that are only 0.35% of other HDC models with dimensions of 10,000. Furthermore, we evaluate our methods on ISOLET, UCI-HAR, and Fashion-MNIST datasets and investigate the limits of HDC computing.  ( 2 min )
    Does Synthetic Data Make Large Language Models More Efficient?. (arXiv:2310.07830v1 [cs.CL])
    Natural Language Processing (NLP) has undergone transformative changes with the advent of deep learning methodologies. One challenge persistently confronting researchers is the scarcity of high-quality, annotated datasets that drive these models. This paper explores the nuances of synthetic data generation in NLP, with a focal point on template-based question generation. By assessing its advantages, including data augmentation potential and the introduction of structured variety, we juxtapose these benefits against inherent limitations, such as the risk of overfitting and the constraints posed by pre-defined templates. Drawing from empirical evaluations, we demonstrate the impact of template-based synthetic data on the performance of modern transformer models. We conclude by emphasizing the delicate balance required between synthetic and real-world data, and the future trajectories of integrating synthetic data in model training pipelines. The findings aim to guide NLP practitioners in harnessing synthetic data's potential, ensuring optimal model performance in diverse applications.  ( 2 min )
    Spiral-Elliptical automated galaxy morphology classification from telescope images. (arXiv:2310.07740v1 [astro-ph.IM])
    The classification of galaxy morphologies is an important step in the investigation of theories of hierarchical structure formation. While human expert visual classification remains quite effective and accurate, it cannot keep up with the massive influx of data from emerging sky surveys. A variety of approaches have been proposed to classify large numbers of galaxies; these approaches include crowdsourced visual classification, and automated and computational methods, such as machine learning methods based on designed morphology statistics and deep learning. In this work, we develop two novel galaxy morphology statistics, descent average and descent variance, which can be efficiently extracted from telescope galaxy images. We further propose simplified versions of the existing image statistics concentration, asymmetry, and clumpiness, which have been widely used in the literature of galaxy morphologies. We utilize the galaxy image data from the Sloan Digital Sky Survey to demonstrate the effective performance of our proposed image statistics at accurately detecting spiral and elliptical galaxies when used as features of a random forest classifier.  ( 2 min )
    Joint Metrics Matter: A Better Standard for Trajectory Forecasting. (arXiv:2305.06292v2 [cs.RO] UPDATED)
    Multi-modal trajectory forecasting methods commonly evaluate using single-agent metrics (marginal metrics), such as minimum Average Displacement Error (ADE) and Final Displacement Error (FDE), which fail to capture joint performance of multiple interacting agents. Only focusing on marginal metrics can lead to unnatural predictions, such as colliding trajectories or diverging trajectories for people who are clearly walking together as a group. Consequently, methods optimized for marginal metrics lead to overly-optimistic estimations of performance, which is detrimental to progress in trajectory forecasting research. In response to the limitations of marginal metrics, we present the first comprehensive evaluation of state-of-the-art (SOTA) trajectory forecasting methods with respect to multi-agent metrics (joint metrics): JADE, JFDE, and collision rate. We demonstrate the importance of joint metrics as opposed to marginal metrics with quantitative evidence and qualitative examples drawn from the ETH / UCY and Stanford Drone datasets. We introduce a new loss function incorporating joint metrics that, when applied to a SOTA trajectory forecasting method, achieves a 7\% improvement in JADE / JFDE on the ETH / UCY datasets with respect to the previous SOTA. Our results also indicate that optimizing for joint metrics naturally leads to an improvement in interaction modeling, as evidenced by a 16\% decrease in mean collision rate on the ETH / UCY datasets with respect to the previous SOTA. Code is available at \texttt{\hyperlink{https://github.com/ericaweng/joint-metrics-matter}{github.com/ericaweng/joint-metrics-matter}}.  ( 3 min )
    DeePref: Deep Reinforcement Learning For Video Prefetching In Content Delivery Networks. (arXiv:2310.07881v1 [cs.NI])
    Content Delivery Networks carry the majority of Internet traffic, and the increasing demand for video content as a major IP traffic across the Internet highlights the importance of caching and prefetching optimization algorithms. Prefetching aims to make data available in the cache before the requester places its request to reduce access time and improve the Quality of Experience on the user side. Prefetching is well investigated in operating systems, compiler instructions, in-memory cache, local storage systems, high-speed networks, and cloud systems. Traditional prefetching techniques are well adapted to a particular access pattern, but fail to adapt to sudden variations or randomization in workloads. This paper explores the use of reinforcement learning to tackle the changes in user access patterns and automatically adapt over time. To this end, we propose, DeePref, a Deep Reinforcement Learning agent for online video content prefetching in Content Delivery Networks. DeePref is a prefetcher implemented on edge networks and is agnostic to hardware design, operating systems, and applications. Our results show that DeePref DRQN, using a real-world dataset, achieves a 17% increase in prefetching accuracy and a 28% increase in prefetching coverage on average compared to baseline approaches that use video content popularity as a building block to statically or dynamically make prefetching decisions. We also study the possibility of transfer learning of statistical models from one edge network into another, where unseen user requests from unknown distribution are observed. In terms of transfer learning, the increase in prefetching accuracy and prefetching coverage are [$30%$, $10%$], respectively. Our source code will be available on Github.  ( 3 min )
    Non-Stationary Contextual Bandit Learning via Neural Predictive Ensemble Sampling. (arXiv:2310.07786v1 [cs.LG])
    Real-world applications of contextual bandits often exhibit non-stationarity due to seasonality, serendipity, and evolving social trends. While a number of non-stationary contextual bandit learning algorithms have been proposed in the literature, they excessively explore due to a lack of prioritization for information of enduring value, or are designed in ways that do not scale in modern applications with high-dimensional user-specific features and large action set, or both. In this paper, we introduce a novel non-stationary contextual bandit algorithm that addresses these concerns. It combines a scalable, deep-neural-network-based architecture with a carefully designed exploration mechanism that strategically prioritizes collecting information with the most lasting value in a non-stationary environment. Through empirical evaluations on two real-world recommendation datasets, which exhibit pronounced non-stationarity, we demonstrate that our approach significantly outperforms the state-of-the-art baselines.  ( 2 min )
    Multi-Scale Spatial-Temporal Recurrent Networks for Traffic Flow Prediction. (arXiv:2310.08138v1 [cs.LG])
    Traffic flow prediction is one of the most fundamental tasks of intelligent transportation systems. The complex and dynamic spatial-temporal dependencies make the traffic flow prediction quite challenging. Although existing spatial-temporal graph neural networks hold prominent, they often encounter challenges such as (1) ignoring the fixed graph that limits the predictive performance of the model, (2) insufficiently capturing complex spatial-temporal dependencies simultaneously, and (3) lacking attention to spatial-temporal information at different time lengths. In this paper, we propose a Multi-Scale Spatial-Temporal Recurrent Network for traffic flow prediction, namely MSSTRN, which consists of two different recurrent neural networks: the single-step gate recurrent unit and the multi-step gate recurrent unit to fully capture the complex spatial-temporal information in the traffic data under different time windows. Moreover, we propose a spatial-temporal synchronous attention mechanism that integrates adaptive position graph convolutions into the self-attention mechanism to achieve synchronous capture of spatial-temporal dependencies. We conducted extensive experiments on four real traffic datasets and demonstrated that our model achieves the best prediction accuracy with non-trivial margins compared to all the twenty baseline methods.  ( 2 min )
    Robust 1-bit Compressed Sensing with Iterative Hard Thresholding. (arXiv:2310.08019v1 [cs.IT])
    In 1-bit compressed sensing, the aim is to estimate a $k$-sparse unit vector $x\in S^{n-1}$ within an $\epsilon$ error (in $\ell_2$) from minimal number of linear measurements that are quantized to just their signs, i.e., from measurements of the form $y = \mathrm{Sign}(\langle a, x\rangle).$ In this paper, we study a noisy version where a fraction of the measurements can be flipped, potentially by an adversary. In particular, we analyze the Binary Iterative Hard Thresholding (BIHT) algorithm, a proximal gradient descent on a properly defined loss function used for 1-bit compressed sensing, in this noisy setting. It is known from recent results that, with $\tilde{O}(\frac{k}{\epsilon})$ noiseless measurements, BIHT provides an estimate within $\epsilon$ error. This result is optimal and universal, meaning one set of measurements work for all sparse vectors. In this paper, we show that BIHT also provides better results than all known methods for the noisy setting. We show that when up to $\tau$-fraction of the sign measurements are incorrect (adversarial error), with the same number of measurements as before, BIHT agnostically provides an estimate of $x$ within an $\tilde{O}(\epsilon+\tau)$ error, maintaining the universality of measurements. This establishes stability of iterative hard thresholding in the presence of measurement error. To obtain the result, we use the restricted approximate invertibility of Gaussian matrices, as well as a tight analysis of the high-dimensional geometry of the adversarially corrupted measurements.  ( 3 min )
    Relaxing the Additivity Constraints in Decentralized No-Regret High-Dimensional Bayesian Optimization. (arXiv:2305.19838v2 [cs.LG] UPDATED)
    Bayesian Optimization (BO) is typically used to optimize an unknown function $f$ that is noisy and costly to evaluate, by exploiting an acquisition function that must be maximized at each optimization step. Even if provably asymptotically optimal BO algorithms are efficient at optimizing low-dimensional functions, scaling them to high-dimensional spaces remains an open problem, often tackled by assuming an additive structure for $f$. By doing so, BO algorithms typically introduce additional restrictive assumptions on the additive structure that reduce their applicability domain. This paper contains two main contributions: (i) we relax the restrictive assumptions on the additive structure of $f$, at the expense of weakening the maximization guarantees of the acquisition function, and (ii) we address the over-exploration problem for decentralized BO algorithms. To these ends, we propose DumBO, an asymptotically optimal decentralized BO algorithm that achieves very competitive performance against state-of-the-art BO algorithms, especially when the additive structure of $f$ comprises high-dimensional factors.  ( 2 min )
    Counterfactual Explanations for Time Series Forecasting. (arXiv:2310.08137v1 [cs.LG])
    Among recent developments in time series forecasting methods, deep forecasting models have gained popularity as they can utilize hidden feature patterns in time series to improve forecasting performance. Nevertheless, the majority of current deep forecasting models are opaque, hence making it challenging to interpret the results. While counterfactual explanations have been extensively employed as a post-hoc approach for explaining classification models, their application to forecasting models still remains underexplored. In this paper, we formulate the novel problem of counterfactual generation for time series forecasting, and propose an algorithm, called ForecastCF, that solves the problem by applying gradient-based perturbations to the original time series. ForecastCF guides the perturbations by applying constraints to the forecasted values to obtain desired prediction outcomes. We experimentally evaluate ForecastCF using four state-of-the-art deep model architectures and compare to two baselines. Our results show that ForecastCF outperforms the baseline in terms of counterfactual validity and data manifold closeness. Overall, our findings suggest that ForecastCF can generate meaningful and relevant counterfactual explanations for various forecasting tasks.  ( 2 min )
    Language Models As Semantic Indexers. (arXiv:2310.07815v1 [cs.IR])
    Semantic identifier (ID) is an important concept in information retrieval that aims to preserve the semantics of objects such as documents and items inside their IDs. Previous studies typically adopt a two-stage pipeline to learn semantic IDs by first procuring embeddings using off-the-shelf text encoders and then deriving IDs based on the embeddings. However, each step introduces potential information loss and there is usually an inherent mismatch between the distribution of embeddings within the latent space produced by text encoders and the anticipated distribution required for semantic indexing. Nevertheless, it is non-trivial to design a method that can learn the document's semantic representations and its hierarchical structure simultaneously, given that semantic IDs are discrete and sequentially structured, and the semantic supervision is deficient. In this paper, we introduce LMINDEXER, a self-supervised framework to learn semantic IDs with a generative language model. We tackle the challenge of sequential discrete ID by introducing a semantic indexer capable of generating neural sequential discrete representations with progressive training and contrastive learning. In response to the semantic supervision deficiency, we propose to train the model with a self-supervised document reconstruction objective. The learned semantic indexer can facilitate various downstream tasks, such as recommendation and retrieval. We conduct experiments on three tasks including recommendation, product search, and document retrieval on five datasets from various domains, where LMINDEXER outperforms competitive baselines significantly and consistently.
    Federated Learning from Small Datasets. (arXiv:2110.03469v3 [cs.LG] UPDATED)
    Federated learning allows multiple parties to collaboratively train a joint model without sharing local data. This enables applications of machine learning in settings of inherently distributed, undisclosable data such as in the medical domain. In practice, joint training is usually achieved by aggregating local models, for which local training objectives have to be in expectation similar to the joint (global) objective. Often, however, local datasets are so small that local objectives differ greatly from the global objective, resulting in federated learning to fail. We propose a novel approach that intertwines model aggregations with permutations of local models. The permutations expose each local model to a daisy chain of local datasets resulting in more efficient training in data-sparse domains. This enables training on extremely small local datasets, such as patient data across hospitals, while retaining the training efficiency and privacy benefits of federated learning.
    Interpretable Diffusion via Information Decomposition. (arXiv:2310.07972v1 [cs.LG])
    Denoising diffusion models enable conditional generation and density modeling of complex relationships like images and text. However, the nature of the learned relationships is opaque making it difficult to understand precisely what relationships between words and parts of an image are captured, or to predict the effect of an intervention. We illuminate the fine-grained relationships learned by diffusion models by noticing a precise relationship between diffusion and information decomposition. Exact expressions for mutual information and conditional mutual information can be written in terms of the denoising model. Furthermore, pointwise estimates can be easily estimated as well, allowing us to ask questions about the relationships between specific images and captions. Decomposing information even further to understand which variables in a high-dimensional space carry information is a long-standing problem. For diffusion models, we show that a natural non-negative decomposition of mutual information emerges, allowing us to quantify informative relationships between words and pixels in an image. We exploit these new relations to measure the compositional understanding of diffusion models, to do unsupervised localization of objects in images, and to measure effects when selectively editing images through prompt interventions.
    ClimateBERT-NetZero: Detecting and Assessing Net Zero and Reduction Targets. (arXiv:2310.08096v1 [cs.LG])
    Public and private actors struggle to assess the vast amounts of information about sustainability commitments made by various institutions. To address this problem, we create a novel tool for automatically detecting corporate, national, and regional net zero and reduction targets in three steps. First, we introduce an expert-annotated data set with 3.5K text samples. Second, we train and release ClimateBERT-NetZero, a natural language classifier to detect whether a text contains a net zero or reduction target. Third, we showcase its analysis potential with two use cases: We first demonstrate how ClimateBERT-NetZero can be combined with conventional question-answering (Q&A) models to analyze the ambitions displayed in net zero and reduction targets. Furthermore, we employ the ClimateBERT-NetZero model on quarterly earning call transcripts and outline how communication patterns evolve over time. Our experiments demonstrate promising pathways for extracting and analyzing net zero and emission reduction targets at scale.
    ZEST: Attention-based Zero-Shot Learning for Unseen IoT Device Classification. (arXiv:2310.08036v1 [cs.NI])
    Recent research works have proposed machine learning models for classifying IoT devices connected to a network. However, there is still a practical challenge of not having all devices (and hence their traffic) available during the training of a model. This essentially means, during the operational phase, we need to classify new devices not seen during the training phase. To address this challenge, we propose ZEST -- a ZSL (zero-shot learning) framework based on self-attention for classifying both seen and unseen devices. ZEST consists of i) a self-attention based network feature extractor, termed SANE, for extracting latent space representations of IoT traffic, ii) a generative model that trains a decoder using latent features to generate pseudo data, and iii) a supervised model that is trained on the generated pseudo data for classifying devices. We carry out extensive experiments on real IoT traffic data; our experiments demonstrate i) ZEST achieves significant improvement (in terms of accuracy) over the baselines; ii) ZEST is able to better extract meaningful representations than LSTM which has been commonly used for modeling network traffic.  ( 2 min )
    CrIBo: Self-Supervised Learning via Cross-Image Object-Level Bootstrapping. (arXiv:2310.07855v1 [cs.CV])
    Leveraging nearest neighbor retrieval for self-supervised representation learning has proven beneficial with object-centric images. However, this approach faces limitations when applied to scene-centric datasets, where multiple objects within an image are only implicitly captured in the global representation. Such global bootstrapping can lead to undesirable entanglement of object representations. Furthermore, even object-centric datasets stand to benefit from a finer-grained bootstrapping approach. In response to these challenges, we introduce a novel Cross-Image Object-Level Bootstrapping method tailored to enhance dense visual representation learning. By employing object-level nearest neighbor bootstrapping throughout the training, CrIBo emerges as a notably strong and adequate candidate for in-context learning, leveraging nearest neighbor retrieval at test time. CrIBo shows state-of-the-art performance on the latter task while being highly competitive in more standard downstream segmentation tasks. Our code and pretrained models will be publicly available upon acceptance.  ( 2 min )
    CleftGAN: Adapting A Style-Based Generative Adversarial Network To Create Images Depicting Cleft Lip Deformity. (arXiv:2310.07969v1 [cs.CV])
    A major obstacle when attempting to train a machine learning system to evaluate facial clefts is the scarcity of large datasets of high-quality, ethics board-approved patient images. In response, we have built a deep learning-based cleft lip generator designed to produce an almost unlimited number of artificial images exhibiting high-fidelity facsimiles of cleft lip with wide variation. We undertook a transfer learning protocol testing different versions of StyleGAN-ADA (a generative adversarial network image generator incorporating adaptive data augmentation (ADA)) as the base model. Training images depicting a variety of cleft deformities were pre-processed to adjust for rotation, scaling, color adjustment and background blurring. The ADA modification of the primary algorithm permitted construction of our new generative model while requiring input of a relatively small number of training images. Adversarial training was carried out using 514 unique frontal photographs of cleft-affected faces to adapt a pre-trained model based on 70,000 normal faces. The Frechet Inception Distance (FID) was used to measure the similarity of the newly generated facial images to the cleft training dataset, while Perceptual Path Length (PPL) and the novel Divergence Index of Severity Histograms (DISH) measures were also used to assess the performance of the image generator that we dub CleftGAN. We found that StyleGAN3 with translation invariance (StyleGAN3-t) performed optimally as a base model. Generated images achieved a low FID reflecting a close similarity to our training input dataset of genuine cleft images. Low PPL and DISH measures reflected a smooth and semantically valid interpolation of images through the transfer learning process and a similar distribution of severity in the training and generated images, respectively.  ( 3 min )
    Beyond Traditional DoE: Deep Reinforcement Learning for Optimizing Experiments in Model Identification of Battery Dynamics. (arXiv:2310.08198v1 [cs.LG])
    Model identification of battery dynamics is a central problem in energy research; many energy management systems and design processes rely on accurate battery models for efficiency optimization. The standard methodology for battery modelling is traditional design of experiments (DoE), where the battery dynamics are excited with many different current profiles and the measured outputs are used to estimate the system dynamics. However, although it is possible to obtain useful models with the traditional approach, the process is time consuming and expensive because of the need to sweep many different current-profile configurations. In the present work, a novel DoE approach is developed based on deep reinforcement learning, which alters the configuration of the experiments on the fly based on the statistics of past experiments. Instead of sticking to a library of predefined current profiles, the proposed approach modifies the current profiles dynamically by updating the output space covered by past measurements, hence only the current profiles that are informative for future experiments are applied. Simulations and real experiments are used to show that the proposed approach gives models that are as accurate as those obtained with traditional DoE but by using 85\% less resources.  ( 2 min )
    Cost-Driven Hardware-Software Co-Optimization of Machine Learning Pipelines. (arXiv:2310.07940v1 [cs.LG])
    Researchers have long touted a vision of the future enabled by a proliferation of internet-of-things devices, including smart sensors, homes, and cities. Increasingly, embedding intelligence in such devices involves the use of deep neural networks. However, their storage and processing requirements make them prohibitive for cheap, off-the-shelf platforms. Overcoming those requirements is necessary for enabling widely-applicable smart devices. While many ways of making models smaller and more efficient have been developed, there is a lack of understanding of which ones are best suited for particular scenarios. More importantly for edge platforms, those choices cannot be analyzed in isolation from cost and user experience. In this work, we holistically explore how quantization, model scaling, and multi-modality interact with system components such as memory, sensors, and processors. We perform this hardware/software co-design from the cost, latency, and user-experience perspective, and develop a set of guidelines for optimal system design and model deployment for the most cost-constrained platforms. We demonstrate our approach using an end-to-end, on-device, biometric user authentication system using a $20 ESP-EYE board.  ( 2 min )
    CHIP: Contrastive Hierarchical Image Pretraining. (arXiv:2310.08304v1 [cs.CV])
    Few-shot object classification is the task of classifying objects in an image with limited number of examples as supervision. We propose a one-shot/few-shot classification model that can classify an object of any unseen class into a relatively general category in an hierarchically based classification. Our model uses a three-level hierarchical contrastive loss based ResNet152 classifier for classifying an object based on its features extracted from Image embedding, not used during the training phase. For our experimentation, we have used a subset of the ImageNet (ILSVRC-12) dataset that contains only the animal classes for training our model and created our own dataset of unseen classes for evaluating our trained model. Our model provides satisfactory results in classifying the unknown objects into a generic category which has been later discussed in greater detail.  ( 2 min )
    On the Computational Complexity of Private High-dimensional Model Selection via the Exponential Mechanism. (arXiv:2310.07852v1 [stat.ML])
    We consider the problem of model selection in a high-dimensional sparse linear regression model under the differential privacy framework. In particular, we consider the problem of differentially private best subset selection and study its utility guarantee. We adopt the well-known exponential mechanism for selecting the best model, and under a certain margin condition, we establish its strong model recovery property. However, the exponential search space of the exponential mechanism poses a serious computational bottleneck. To overcome this challenge, we propose a Metropolis-Hastings algorithm for the sampling step and establish its polynomial mixing time to its stationary distribution in the problem parameters $n,p$, and $s$. Furthermore, we also establish approximate differential privacy for the final estimates of the Metropolis-Hastings random walk using its mixing property. Finally, we also perform some illustrative simulations that echo the theoretical findings of our main results.  ( 2 min )
    The Expresssive Power of Transformers with Chain of Thought. (arXiv:2310.07923v1 [cs.LG])
    Recent theoretical work has identified surprisingly simple reasoning problems, such as checking if two nodes in a graph are connected or simulating finite-state machines, that are provably unsolvable by standard transformers that answer immediately after reading their input. However, in practice, transformers' reasoning can be improved by allowing them to use a "chain of thought" or "scratchpad", i.e., generate and condition on a sequence of intermediate tokens before answering. Motivated by this, we ask: Does such intermediate generation fundamentally extend the computational power of a decoder-only transformer? We show that the answer is yes, but the amount of increase depends crucially on the amount of intermediate generation. For instance, we find that transformer decoders with a logarithmic number of decoding steps (w.r.t. the input length) push the limits of standard transformers only slightly, while a linear number of decoding steps adds a clear new ability (under standard complexity conjectures): recognizing all regular languages. Our results also imply that linear steps keep transformer decoders within context-sensitive languages, and polynomial steps make them recognize exactly the class of polynomial-time solvable problems -- the first exact characterization of a type of transformers in terms of standard complexity classes. Together, our results provide a nuanced framework for understanding how the length of a transformer's chain of thought or scratchpad impacts its reasoning power.  ( 2 min )
    Emulating the dynamics of complex systems using autoregressive models on manifolds (mNARX). (arXiv:2306.16335v2 [stat.CO] UPDATED)
    We propose a novel surrogate modelling approach to efficiently and accurately approximate the response of complex dynamical systems driven by time-varying exogenous excitations over extended time periods. Our approach, namely manifold nonlinear autoregressive modelling with exogenous input (mNARX), involves constructing a problem-specific exogenous input manifold that is optimal for constructing autoregressive surrogates. The manifold, which forms the core of mNARX, is constructed incrementally by incorporating the physics of the system, as well as prior expert- and domain- knowledge. Because mNARX decomposes the full problem into a series of smaller sub-problems, each with a lower complexity than the original, it scales well with the complexity of the problem, both in terms of training and evaluation costs of the final surrogate. Furthermore, mNARX synergizes well with traditional dimensionality reduction techniques, making it highly suitable for modelling dynamical systems with high-dimensional exogenous inputs, a class of problems that is typically challenging to solve. Since domain knowledge is particularly abundant in physical systems, such as those found in civil and mechanical engineering, mNARX is well suited for these applications. We demonstrate that mNARX outperforms traditional autoregressive surrogates in predicting the response of a classical coupled spring-mass system excited by a one-dimensional random excitation. Additionally, we show that mNARX is well suited for emulating very high-dimensional time- and state-dependent systems, even when affected by active controllers, by surrogating the dynamics of a realistic aero-servo-elastic onshore wind turbine simulator. In general, our results demonstrate that mNARX offers promising prospects for modelling complex dynamical systems, in terms of accuracy and efficiency.  ( 3 min )
    Leader-Follower Neural Networks with Local Error Signals Inspired by Complex Collectives. (arXiv:2310.07885v1 [cs.LG])
    The collective behavior of a network with heterogeneous, resource-limited information processing units (e.g., group of fish, flock of birds, or network of neurons) demonstrates high self-organization and complexity. These emergent properties arise from simple interaction rules where certain individuals can exhibit leadership-like behavior and influence the collective activity of the group. Motivated by the intricacy of these collectives, we propose a neural network (NN) architecture inspired by the rules observed in nature's collective ensembles. This NN structure contains workers that encompass one or more information processing units (e.g., neurons, filters, layers, or blocks of layers). Workers are either leaders or followers, and we train a leader-follower neural network (LFNN) by leveraging local error signals and optionally incorporating backpropagation (BP) and global loss. We investigate worker behavior and evaluate LFNNs through extensive experimentation. Our LFNNs trained with local error signals achieve significantly lower error rates than previous BP-free algorithms on MNIST and CIFAR-10 and even surpass BP-enabled baselines. In the case of ImageNet, our LFNN-l demonstrates superior scalability and outperforms previous BP-free algorithms by a significant margin.  ( 2 min )
    Towards Causal Deep Learning for Vulnerability Detection. (arXiv:2310.07958v1 [cs.SE])
    Deep learning vulnerability detection has shown promising results in recent years. However, an important challenge that still blocks it from being very useful in practice is that the model is not robust under perturbation and it cannot generalize well over the out-of-distribution (OOD) data, e.g., applying a trained model to unseen projects in real world. We hypothesize that this is because the model learned non-robust features, e.g., variable names, that have spurious correlations with labels. When the perturbed and OOD datasets no longer have the same spurious features, the model prediction fails. To address the challenge, in this paper, we introduced causality into deep learning vulnerability detection. Our approach CausalVul consists of two phases. First, we designed novel perturbations to discover spurious features that the model may use to make predictions. Second, we applied the causal learning algorithms, specifically, do-calculus, on top of existing deep learning models to systematically remove the use of spurious features and thus promote causal based prediction. Our results show that CausalVul consistently improved the model accuracy, robustness and OOD performance for all the state-of-the-art models and datasets we experimented. To the best of our knowledge, this is the first work that introduces do calculus based causal learning to software engineering models and shows it's indeed useful for improving the model accuracy, robustness and generalization. Our replication package is located at https://figshare.com/s/0ffda320dcb96c249ef2.  ( 2 min )
    D2 Pruning: Message Passing for Balancing Diversity and Difficulty in Data Pruning. (arXiv:2310.07931v1 [cs.LG])
    Analytical theories suggest that higher-quality data can lead to lower test errors in models trained on a fixed data budget. Moreover, a model can be trained on a lower compute budget without compromising performance if a dataset can be stripped of its redundancies. Coreset selection (or data pruning) seeks to select a subset of the training data so as to maximize the performance of models trained on this subset, also referred to as coreset. There are two dominant approaches: (1) geometry-based data selection for maximizing data diversity in the coreset, and (2) functions that assign difficulty scores to samples based on training dynamics. Optimizing for data diversity leads to a coreset that is biased towards easier samples, whereas, selection by difficulty ranking omits easy samples that are necessary for the training of deep learning models. This demonstrates that data diversity and importance scores are two complementary factors that need to be jointly considered during coreset selection. We represent a dataset as an undirected graph and propose a novel pruning algorithm, D2 Pruning, that uses forward and reverse message passing over this dataset graph for coreset selection. D2 Pruning updates the difficulty scores of each example by incorporating the difficulty of its neighboring examples in the dataset graph. Then, these updated difficulty scores direct a graph-based sampling method to select a coreset that encapsulates both diverse and difficult regions of the dataset space. We evaluate supervised and self-supervised versions of our method on various vision and language datasets. Results show that D2 Pruning improves coreset selection over previous state-of-the-art methods for up to 70% pruning rates. Additionally, we find that using D2 Pruning for filtering large multimodal datasets leads to increased diversity in the dataset and improved generalization of pretrained models.  ( 3 min )
    Promoting Robustness of Randomized Smoothing: Two Cost-Effective Approaches. (arXiv:2310.07780v1 [cs.LG])
    Randomized smoothing has recently attracted attentions in the field of adversarial robustness to provide provable robustness guarantees on smoothed neural network classifiers. However, existing works show that vanilla randomized smoothing usually does not provide good robustness performance and often requires (re)training techniques on the base classifier in order to boost the robustness of the resulting smoothed classifier. In this work, we propose two cost-effective approaches to boost the robustness of randomized smoothing while preserving its clean performance. The first approach introduces a new robust training method AdvMacerwhich combines adversarial training and robustness certification maximization for randomized smoothing. We show that AdvMacer can improve the robustness performance of randomized smoothing classifiers compared to SOTA baselines, while being 3x faster to train than MACER baseline. The second approach introduces a post-processing method EsbRS which greatly improves the robustness certificate based on building model ensembles. We explore different aspects of model ensembles that has not been studied by prior works and propose a novel design methodology to further improve robustness of the ensemble based on our theoretical analysis.  ( 2 min )
    First-Order Dynamic Optimization for Streaming Convex Costs. (arXiv:2310.07925v1 [math.OC])
    This paper proposes a set of novel optimization algorithms for solving a class of convex optimization problems with time-varying streaming cost function. We develop an approach to track the optimal solution with a bounded error. Unlike the existing results, our algorithm is executed only by using the first-order derivatives of the cost function which makes it computationally efficient for optimization with time-varying cost function. We compare our algorithms to the gradient descent algorithm and show why gradient descent is not an effective solution for optimization problems with time-varying cost. Several examples including solving a model predictive control problem cast as a convex optimization problem with a streaming time-varying cost function demonstrate our results.  ( 2 min )
    Local Graph Clustering with Noisy Labels. (arXiv:2310.08031v1 [cs.LG])
    The growing interest in machine learning problems over graphs with additional node information such as texts, images, or labels has popularized methods that require the costly operation of processing the entire graph. Yet, little effort has been made to the development of fast local methods (i.e. without accessing the entire graph) that extract useful information from such data. To that end, we propose a study of local graph clustering using noisy node labels as a proxy for additional node information. In this setting, nodes receive initial binary labels based on cluster affiliation: 1 if they belong to the target cluster and 0 otherwise. Subsequently, a fraction of these labels is flipped. We investigate the benefits of incorporating noisy labels for local graph clustering. By constructing a weighted graph with such labels, we study the performance of graph diffusion-based local clustering method on both the original and the weighted graphs. From a theoretical perspective, we consider recovering an unknown target cluster with a single seed node in a random graph with independent noisy node labels. We provide sufficient conditions on the label noise under which, with high probability, using diffusion in the weighted graph yields a more accurate recovery of the target cluster. This approach proves more effective than using the given labels alone or using diffusion in the label-free original graph. Empirically, we show that reliable node labels can be obtained with just a few samples from an attributed graph. Moreover, utilizing these labels via diffusion in the weighted graph leads to significantly better local clustering performance across several real-world datasets, improving F1 scores by up to 13%.  ( 3 min )
    A Framework for Adapting Offline Algorithms to Solve Combinatorial Multi-Armed Bandit Problems with Bandit Feedback. (arXiv:2301.13326v2 [cs.LG] UPDATED)
    We investigate the problem of stochastic, combinatorial multi-armed bandits where the learner only has access to bandit feedback and the reward function can be non-linear. We provide a general framework for adapting discrete offline approximation algorithms into sublinear $\alpha$-regret methods that only require bandit feedback, achieving $\mathcal{O}\left(T^\frac{2}{3}\log(T)^\frac{1}{3}\right)$ expected cumulative $\alpha$-regret dependence on the horizon $T$. The framework only requires the offline algorithms to be robust to small errors in function evaluation. The adaptation procedure does not even require explicit knowledge of the offline approximation algorithm -- the offline algorithm can be used as a black box subroutine. To demonstrate the utility of the proposed framework, the proposed framework is applied to diverse applications in submodular maximization. The new CMAB algorithms for submodular maximization with knapsack constraints outperform a full-bandit method developed for the adversarial setting in experiments with real-world data.  ( 3 min )
    Learning to Simulate Tree-Branch Dynamics for Manipulation. (arXiv:2306.03410v2 [cs.RO] UPDATED)
    We propose to use a simulation driven inverse inference approach to model the dynamics of tree branches under manipulation. Learning branch dynamics and gaining the ability to manipulate deformable vegetation can help with occlusion-prone tasks, such as fruit picking in dense foliage, as well as moving overhanging vines and branches for navigation in dense vegetation. The underlying deformable tree geometry is encapsulated as coarse spring abstractions executed on parallel, non-differentiable simulators. The implicit statistical model defined by the simulator, reference trajectories obtained by actively probing the ground truth, and the Bayesian formalism, together guide the spring parameter posterior density estimation. Our non-parametric inference algorithm, based on Stein Variational Gradient Descent, incorporates biologically motivated assumptions into the inference process as neural network driven learnt joint priors; moreover, it leverages the finite difference scheme for gradient approximations. Real and simulated experiments confirm that our model can predict deformation trajectories, quantify the estimation uncertainty, and it can perform better when base-lined against other inference algorithms, particularly from the Monte Carlo family. The model displays strong robustness properties in the presence of heteroscedastic sensor noise; furthermore, it can generalise to unseen grasp locations.  ( 2 min )
    A Transfer-Learning-Based Prognosis Prediction Paradigm that Bridges Data Distribution Shift across EMR Datasets. (arXiv:2310.07799v1 [cs.LG])
    Due to the limited information about emerging diseases, symptoms are hard to be noticed and recognized, so that the window for clinical intervention could be ignored. An effective prognostic model is expected to assist doctors in making right diagnosis and designing personalized treatment plan, so to promptly prevent unfavorable outcomes. However, in the early stage of a disease, limited data collection and clinical experiences, plus the concern out of privacy and ethics, may result in restricted data availability for reference, to the extent that even data labels are difficult to mark correctly. In addition, Electronic Medical Record (EMR) data of different diseases or of different sources of the same disease can prove to be having serious cross-dataset feature misalignment problems, greatly mutilating the efficiency of deep learning models. This article introduces a transfer learning method to build a transition model from source dataset to target dataset. By way of constraining the distribution shift of features generated in disparate domains, domain-invariant features that are exclusively relative to downstream tasks are captured, so to cultivate a unified domain-invariant encoder across various task domains to achieve better feature representation. Experimental results of several target tasks demonstrate that our proposed model outperforms competing baseline methods and has higher rate of training convergence, especially in dealing with limited data amount. A multitude of experiences have proven the efficacy of our method to provide more accurate predictions concerning newly emergent pandemics and other diseases.  ( 3 min )
    Elastic Decision Transformer. (arXiv:2307.02484v5 [cs.LG] UPDATED)
    This paper introduces Elastic Decision Transformer (EDT), a significant advancement over the existing Decision Transformer (DT) and its variants. Although DT purports to generate an optimal trajectory, empirical evidence suggests it struggles with trajectory stitching, a process involving the generation of an optimal or near-optimal trajectory from the best parts of a set of sub-optimal trajectories. The proposed EDT differentiates itself by facilitating trajectory stitching during action inference at test time, achieved by adjusting the history length maintained in DT. Further, the EDT optimizes the trajectory by retaining a longer history when the previous trajectory is optimal and a shorter one when it is sub-optimal, enabling it to "stitch" with a more optimal trajectory. Extensive experimentation demonstrates EDT's ability to bridge the performance gap between DT-based and Q Learning-based approaches. In particular, the EDT outperforms Q Learning-based methods in a multi-task regime on the D4RL locomotion benchmark and Atari games. Videos are available at: https://kristery.github.io/edt/  ( 2 min )
    Limits of Model Selection under Transfer Learning. (arXiv:2305.00152v4 [stat.ML] UPDATED)
    Theoretical studies on transfer learning or domain adaptation have so far focused on situations with a known hypothesis class or model; however in practice, some amount of model selection is usually involved, often appearing under the umbrella term of hyperparameter-tuning: for example, one may think of the problem of tuning for the right neural network architecture towards a target task, while leveraging data from a related source task. Now, in addition to the usual tradeoffs on approximation vs estimation errors involved in model selection, this problem brings in a new complexity term, namely, the transfer distance between source and target distributions, which is known to vary with the choice of hypothesis class. We present a first study of this problem, focusing on classification; in particular, the analysis reveals some remarkable phenomena: adaptive rates, i.e., those achievable with no distributional information, can be arbitrarily slower than oracle rates, i.e., when given knowledge on distances.  ( 2 min )
    GROOT: Learning to Follow Instructions by Watching Gameplay Videos. (arXiv:2310.08235v1 [cs.AI])
    We study the problem of building a controller that can follow open-ended instructions in open-world environments. We propose to follow reference videos as instructions, which offer expressive goal specifications while eliminating the need for expensive text-gameplay annotations. A new learning framework is derived to allow learning such instruction-following controllers from gameplay videos while producing a video instruction encoder that induces a structured goal space. We implement our agent GROOT in a simple yet effective encoder-decoder architecture based on causal transformers. We evaluate GROOT against open-world counterparts and human players on a proposed Minecraft SkillForge benchmark. The Elo ratings clearly show that GROOT is closing the human-machine gap as well as exhibiting a 70% winning rate over the best generalist agent baseline. Qualitative analysis of the induced goal space further demonstrates some interesting emergent properties, including the goal composition and complex gameplay behavior synthesis. Code and video can be found on the website https://craftjarvis-groot.github.io.  ( 2 min )
    XIMAGENET-12: An Explainable AI Benchmark Dataset for Model Robustness Evaluation. (arXiv:2310.08182v1 [cs.CV])
    The lack of standardized robustness metrics and the widespread reliance on numerous unrelated benchmark datasets for testing have created a gap between academically validated robust models and their often problematic practical adoption. To address this, we introduce XIMAGENET-12, an explainable benchmark dataset with over 200K images and 15,600 manual semantic annotations. Covering 12 categories from ImageNet to represent objects commonly encountered in practical life and simulating six diverse scenarios, including overexposure, blurring, color changing, etc., we further propose a novel robustness criterion that extends beyond model generation ability assessment. This benchmark dataset, along with related code, is available at https://sites.google.com/view/ximagenet-12/home. Researchers and practitioners can leverage this resource to evaluate the robustness of their visual models under challenging conditions and ultimately benefit from the demands of practical computer vision systems.  ( 2 min )
    Samples on Thin Ice: Re-Evaluating Adversarial Pruning of Neural Networks. (arXiv:2310.08073v1 [cs.LG])
    Neural network pruning has shown to be an effective technique for reducing the network size, trading desirable properties like generalization and robustness to adversarial attacks for higher sparsity. Recent work has claimed that adversarial pruning methods can produce sparse networks while also preserving robustness to adversarial examples. In this work, we first re-evaluate three state-of-the-art adversarial pruning methods, showing that their robustness was indeed overestimated. We then compare pruned and dense versions of the same models, discovering that samples on thin ice, i.e., closer to the unpruned model's decision boundary, are typically misclassified after pruning. We conclude by discussing how this intuition may lead to designing more effective adversarial pruning methods in future work.  ( 2 min )
    Data driven modeling of self-similar dynamics. (arXiv:2310.08282v1 [cs.LG])
    Multiscale modeling of complex systems is crucial for understanding their intricacies. Data-driven multiscale modeling has emerged as a promising approach to tackle challenges associated with complex systems. On the other hand, self-similarity is prevalent in complex systems, hinting that large-scale complex systems can be modeled at a reduced cost. In this paper, we introduce a multiscale neural network framework that incorporates self-similarity as prior knowledge, facilitating the modeling of self-similar dynamical systems. For deterministic dynamics, our framework can discern whether the dynamics are self-similar. For uncertain dynamics, it can compare and determine which parameter set is closer to self-similarity. The framework allows us to extract scale-invariant kernels from the dynamics for modeling at any scale. Moreover, our method can identify the power law exponents in self-similar systems. Preliminary tests on the Ising model yielded critical exponents consistent with theoretical expectations, providing valuable insights for addressing critical phase transitions in non-equilibrium systems.  ( 2 min )
    To token or not to token: A Comparative Study of Text Representations for Cross-Lingual Transfer. (arXiv:2310.08078v1 [cs.CL])
    Choosing an appropriate tokenization scheme is often a bottleneck in low-resource cross-lingual transfer. To understand the downstream implications of text representation choices, we perform a comparative analysis on language models having diverse text representation modalities including 2 segmentation-based models (\texttt{BERT}, \texttt{mBERT}), 1 image-based model (\texttt{PIXEL}), and 1 character-level model (\texttt{CANINE}). First, we propose a scoring Language Quotient (LQ) metric capable of providing a weighted representation of both zero-shot and few-shot evaluation combined. Utilizing this metric, we perform experiments comprising 19 source languages and 133 target languages on three tasks (POS tagging, Dependency parsing, and NER). Our analysis reveals that image-based models excel in cross-lingual transfer when languages are closely related and share visually similar scripts. However, for tasks biased toward word meaning (POS, NER), segmentation-based models prove to be superior. Furthermore, in dependency parsing tasks where word relationships play a crucial role, models with their character-level focus, outperform others. Finally, we propose a recommendation scheme based on our findings to guide model selection according to task and language requirements.  ( 2 min )
    Learning from Label Proportions: Bootstrapping Supervised Learners via Belief Propagation. (arXiv:2310.08056v1 [cs.LG])
    Learning from Label Proportions (LLP) is a learning problem where only aggregate level labels are available for groups of instances, called bags, during training, and the aim is to get the best performance at the instance-level on the test data. This setting arises in domains like advertising and medicine due to privacy considerations. We propose a novel algorithmic framework for this problem that iteratively performs two main steps. For the first step (Pseudo Labeling) in every iteration, we define a Gibbs distribution over binary instance labels that incorporates a) covariate information through the constraint that instances with similar covariates should have similar labels and b) the bag level aggregated label. We then use Belief Propagation (BP) to marginalize the Gibbs distribution to obtain pseudo labels. In the second step (Embedding Refinement), we use the pseudo labels to provide supervision for a learner that yields a better embedding. Further, we iterate on the two steps again by using the second step's embeddings as new covariates for the next iteration. In the final iteration, a classifier is trained using the pseudo labels. Our algorithm displays strong gains against several SOTA baselines (up to 15%) for the LLP Binary Classification problem on various dataset types - tabular and Image. We achieve these improvements with minimal computational overhead above standard supervised learning due to Belief Propagation, for large bag sizes, even for a million samples.  ( 2 min )
    Overview of Physics-Informed Machine Learning Inversion of Geophysical Data. (arXiv:2310.08109v1 [physics.geo-ph])
    We review four types of algorithms for physics-informed machine learning (PIML) inversion of geophysical data. The unifying equation is given by the joint objective function $\epsilon$: \begin{eqnarray} \epsilon^{||-PIML}&=&\lambda_1 \overbrace{||{\bf W}^{ML}({\bf H}_{{\bf w}} {\bf d}^{obs}-{\bf m})||^2}^{NN} + \lambda_2 \overbrace{{||{\bf W}^{FWI}({\bf L} {\bf m}-{\bf d}^{obs})||^2}}^{FWI} ~+ \nonumber\\ \nonumber\\ && + ~~Regularizer, \label{PIML.eq120} \end{eqnarray}where the optimal model ${\bf m}^*$ and weights $\bf w^*$ minimize $\epsilon$. Here, The matrix weights are given by the boldface symbol $\bf W$, and full waveform inversion (FWI) is typically computed using a finite-difference solution of the wave equation, where $\bf L$ represents the forward modeling operation of the wave equation as a function of the model $\bf m$. Also, a fully-connected neural network (NN) is used to compute the model ${\bf H_w}{\bf d}^{obs} \approx \bf m$ from the observed input data ${\bf d}^{obs}$. The selection of weights $\lambda_i$ and the NN operations determine one of four different PIML algorithms. PIML offers potential advantages over standard FWI through its enhanced ability to avoid local minima and the option to locally train the inversion operator, minimizing the requirement for extensive training data for global applicability. However, the effectiveness of PIML relies on the similarity between the test and trained data. Nevertheless, a possible strategy to overcome this limitation involves initial pretraining of a PIML architecture with data from a broader region, followed by fine-tuning for specific data-a method reminiscent of the way large language models are pretrained and adapted for various tasks.  ( 2 min )
    LGL-BCI: A Lightweight Geometric Learning Framework for Motor Imagery-Based Brain-Computer Interfaces. (arXiv:2310.08051v1 [cs.LG])
    Brain-Computer Interfaces (BCIs) are a groundbreaking technology for interacting with external devices using brain signals. Despite advancements, electroencephalogram (EEG)-based Motor Imagery (MI) tasks face challenges like amplitude and phase variability, and complex spatial correlations, with a need for smaller model size and faster inference. This study introduces the LGL-BCI framework, employing a Geometric Deep Learning Framework for EEG processing in non-Euclidean metric spaces, particularly the Symmetric Positive Definite (SPD) Manifold space. LGL-BCI offers robust EEG data representation and captures spatial correlations. We propose an EEG channel selection solution via a feature decomposition algorithm to reduce SPD matrix dimensionality, with a lossless transformation boosting inference speed. Extensive experiments show LGL-BCI's superior accuracy and efficiency compared to current solutions, highlighting geometric deep learning's potential in MI-BCI applications. The efficiency, assessed on two public EEG datasets and two real-world EEG devices, significantly outperforms the state-of-the-art solution in accuracy ($82.54\%$ versus $62.22\%$) with fewer parameters (64.9M compared to 183.7M).  ( 2 min )
    SimCKP: Simple Contrastive Learning of Keyphrase Representations. (arXiv:2310.08221v1 [cs.CL])
    Keyphrase generation (KG) aims to generate a set of summarizing words or phrases given a source document, while keyphrase extraction (KE) aims to identify them from the text. Because the search space is much smaller in KE, it is often combined with KG to predict keyphrases that may or may not exist in the corresponding document. However, current unified approaches adopt sequence labeling and maximization-based generation that primarily operate at a token level, falling short in observing and scoring keyphrases as a whole. In this work, we propose SimCKP, a simple contrastive learning framework that consists of two stages: 1) An extractor-generator that extracts keyphrases by learning context-aware phrase-level representations in a contrastive manner while also generating keyphrases that do not appear in the document; 2) A reranker that adapts scores for each generated phrase by likewise aligning their representations with the corresponding document. Experimental results on multiple benchmark datasets demonstrate the effectiveness of our proposed approach, which outperforms the state-of-the-art models by a significant margin.  ( 2 min )
    Multi-SpacePhish: Extending the Evasion-space of Adversarial Attacks against Phishing Website Detectors using Machine Learning. (arXiv:2210.13660v3 [cs.CR] UPDATED)
    Existing literature on adversarial Machine Learning (ML) focuses either on showing attacks that break every ML model, or defenses that withstand most attacks. Unfortunately, little consideration is given to the actual feasibility of the attack or the defense. Moreover, adversarial samples are often crafted in the "feature-space", making the corresponding evaluations of questionable value. Simply put, the current situation does not allow to estimate the actual threat posed by adversarial attacks, leading to a lack of secure ML systems. We aim to clarify such confusion in this paper. By considering the application of ML for Phishing Website Detection (PWD), we formalize the "evasion-space" in which an adversarial perturbation can be introduced to fool a ML-PWD -- demonstrating that even perturbations in the "feature-space" are useful. Then, we propose a realistic threat model describing evasion attacks against ML-PWD that are cheap to stage, and hence intrinsically more attractive for real phishers. After that, we perform the first statistically validated assessment of state-of-the-art ML-PWD against 12 evasion attacks. Our evaluation shows (i) the true efficacy of evasion attempts that are more likely to occur; and (ii) the impact of perturbations crafted in different evasion-spaces. Our realistic evasion attempts induce a statistically significant degradation (3-10% at p<0.05), and their cheap cost makes them a subtle threat. Notably, however, some ML-PWD are immune to our most realistic attacks (p=0.22). Finally, as an additional contribution of this journal publication, we are the first to consider the intriguing case wherein an attacker introduces perturbations in multiple evasion-spaces at the same time. These new results show that simultaneously applying perturbations in the problem- and feature-space can cause a drop in the detection rate from 0.95 to 0.  ( 3 min )
    PRiSM: Enhancing Low-Resource Document-Level Relation Extraction with Relation-Aware Score Calibration. (arXiv:2309.13869v1 [cs.CL] CROSS LISTED)
    Document-level relation extraction (DocRE) aims to extract relations of all entity pairs in a document. A key challenge in DocRE is the cost of annotating such data which requires intensive human effort. Thus, we investigate the case of DocRE in a low-resource setting, and we find that existing models trained on low data overestimate the NA ("no relation") label, causing limited performance. In this work, we approach the problem from a calibration perspective and propose PRiSM, which learns to adapt logits based on relation semantic information. We evaluate our method on three DocRE datasets and demonstrate that integrating existing models with PRiSM improves performance by as much as 26.38 F1 score, while the calibration error drops as much as 36 times when trained with about 3% of data. The code is publicly available at https://github.com/brightjade/PRiSM.  ( 2 min )
    Model-Free, Regret-Optimal Best Policy Identification in Online CMDPs. (arXiv:2309.15395v2 [cs.LG] UPDATED)
    This paper considers the best policy identification (BPI) problem in online Constrained Markov Decision Processes (CMDPs). We are interested in algorithms that are model-free, have low regret, and identify an optimal policy with a high probability. Existing model-free algorithms for online CMDPs with sublinear regret and constraint violation do not provide any convergence guarantee to an optimal policy and provide only average performance guarantees when a policy is uniformly sampled at random from all previously used policies. In this paper, we develop a new algorithm, named Pruning-Refinement-Identification (PRI), based on a fundamental structural property of CMDPs we discover, called limited stochasticity. The property says for a CMDP with $N$ constraints, there exists an optimal policy with at most $N$ stochastic decisions. The proposed algorithm first identifies at which step and in which state a stochastic decision has to be taken and then fine-tunes the distributions of these stochastic decisions. PRI achieves trio objectives: (i) PRI is a model-free algorithm; and (ii) it outputs a near-optimal policy with a high probability at the end of learning; and (iii) in the tabular setting, PRI guarantees $\tilde{\mathcal{O}}(\sqrt{K})$ regret and constraint violation, which significantly improves the best existing regret bound $\tilde{\mathcal{O}}(K^{\frac{4}{5}})$ under a model-free algorithm, where $K$ is the total number of episodes.  ( 2 min )
    Rethinking the BERT-like Pretraining for DNA Sequences. (arXiv:2310.07644v2 [cs.AI] UPDATED)
    With the success of large-scale pretraining in NLP, there is an increasing trend of applying it to the domain of life sciences. In particular, pretraining methods based on DNA sequences have garnered growing attention due to their potential to capture generic information about genes. However, existing pretraining methods for DNA sequences largely rely on direct adoptions of BERT pretraining from NLP, lacking a comprehensive understanding and a specifically tailored approach. To address this research gap, we first conducted a series of exploratory experiments and gained several insightful observations: 1) In the fine-tuning phase of downstream tasks, when using K-mer overlapping tokenization instead of K-mer non-overlapping tokenization, both overlapping and non-overlapping pretraining weights show consistent performance improvement.2) During the pre-training process, using K-mer overlapping tokenization quickly produces clear K-mer embeddings and reduces the loss to a very low level, while using K-mer non-overlapping tokenization results in less distinct embeddings and continuously decreases the loss. 3) Using overlapping tokenization causes the self-attention in the intermediate layers of pre-trained models to tend to overly focus on certain tokens, reflecting that these layers are not adequately optimized. In summary, overlapping tokenization can benefit the fine-tuning of downstream tasks but leads to inadequate pretraining with fast convergence. To unleash the pretraining potential, we introduce a novel approach called RandomMask, which gradually increases the task difficulty of BERT-like pretraining by continuously expanding its mask boundary, forcing the model to learn more knowledge. RandomMask is simple but effective, achieving top-tier performance across 26 datasets of 28 datasets spanning 7 downstream tasks.  ( 3 min )
    Memorization Capacity of Multi-Head Attention in Transformers. (arXiv:2306.02010v2 [cs.LG] UPDATED)
    Transformers have become the go-to architecture for language and vision tasks, yet their theoretical properties, especially memorization capacity, remain elusive. This paper investigates the memorization abilities of multi-head attention mechanisms, examining how many example sequences they can memorize, as a function of the number of heads and sequence length. Motivated by experimental findings on vision transformers, we introduce novel assumptions about the linear independence of input data, distinct from the commonly used general-position assumption. Under these assumptions, we demonstrate that an attention layer with $H$ heads, dimension $d$, and context size $n < d$, featuring $\Theta(Hd^2)$ parameters, can memorize $\Omega(Hn)$ examples. Our analysis sheds light on how different attention heads handle various example sequences, aided by the softmax operator's saturation property. We validate our findings through experiments on synthetic data.  ( 2 min )
    CRITERIA: a New Benchmarking Paradigm for Evaluating Trajectory Prediction Models for Autonomous Driving. (arXiv:2310.07794v1 [cs.CV])
    Benchmarking is a common method for evaluating trajectory prediction models for autonomous driving. Existing benchmarks rely on datasets, which are biased towards more common scenarios, such as cruising, and distance-based metrics that are computed by averaging over all scenarios. Following such a regiment provides a little insight into the properties of the models both in terms of how well they can handle different scenarios and how admissible and diverse their outputs are. There exist a number of complementary metrics designed to measure the admissibility and diversity of trajectories, however, they suffer from biases, such as length of trajectories. In this paper, we propose a new benChmarking paRadIgm for evaluaTing trajEctoRy predIction Approaches (CRITERIA). Particularly, we propose 1) a method for extracting driving scenarios at varying levels of specificity according to the structure of the roads, models' performance, and data properties for fine-grained ranking of prediction models; 2) A set of new bias-free metrics for measuring diversity, by incorporating the characteristics of a given scenario, and admissibility, by considering the structure of roads and kinematic compliancy, motivated by real-world driving constraints. 3) Using the proposed benchmark, we conduct extensive experimentation on a representative set of the prediction models using the large scale Argoverse dataset. We show that the proposed benchmark can produce a more accurate ranking of the models and serve as a means of characterizing their behavior. We further present ablation studies to highlight contributions of different elements that are used to compute the proposed metrics.  ( 3 min )
    Dynamic Subgoal-based Exploration via Bayesian Optimization. (arXiv:1910.09143v5 [math.OC] UPDATED)
    Reinforcement learning in sparse-reward navigation environments with expensive and limited interactions is challenging and poses a need for effective exploration. Motivated by complex navigation tasks that require real-world training (when cheap simulators are not available), we consider an agent that faces an unknown distribution of environments and must decide on an exploration strategy. It may leverage a series of training environments to improve its policy before it is evaluated in a test environment drawn from the same environment distribution. Most existing approaches focus on fixed exploration strategies, while the few that view exploration as a meta-optimization problem tend to ignore the need for cost-efficient exploration. We propose a cost-aware Bayesian optimization approach that efficiently searches over a class of dynamic subgoal-based exploration strategies. The algorithm adjusts a variety of levers -- the locations of the subgoals, the length of each episode, and the number of replications per trial -- in order to overcome the challenges of sparse rewards, expensive interactions, and noise. An experimental evaluation demonstrates that the new approach outperforms existing baselines across a number of problem domains. We also provide a theoretical foundation and prove that the method asymptotically identifies a near-optimal subgoal design.  ( 2 min )
    Automatic Intrinsic Reward Shaping for Exploration in Deep Reinforcement Learning. (arXiv:2301.10886v5 [cs.LG] UPDATED)
    We present AIRS: Automatic Intrinsic Reward Shaping that intelligently and adaptively provides high-quality intrinsic rewards to enhance exploration in reinforcement learning (RL). More specifically, AIRS selects shaping function from a predefined set based on the estimated task return in real-time, providing reliable exploration incentives and alleviating the biased objective problem. Moreover, we develop an intrinsic reward toolkit to provide efficient and reliable implementations of diverse intrinsic reward approaches. We test AIRS on various tasks of MiniGrid, Procgen, and DeepMind Control Suite. Extensive simulation demonstrates that AIRS can outperform the benchmarking schemes and achieve superior performance with simple architecture.  ( 2 min )
    Refined Mechanism Design for Approximately Structured Priors via Active Regression. (arXiv:2310.07874v1 [cs.GT])
    We consider the problem of a revenue-maximizing seller with a large number of items $m$ for sale to $n$ strategic bidders, whose valuations are drawn independently from high-dimensional, unknown prior distributions. It is well-known that optimal and even approximately-optimal mechanisms for this setting are notoriously difficult to characterize or compute, and, even when they can be found, are often rife with various counter-intuitive properties. In this paper, following a model introduced recently by Cai and Daskalakis~\cite{cai2022recommender}, we consider the case that bidders' prior distributions can be well-approximated by a topic model. We design an active learning component, responsible for interacting with the bidders and outputting low-dimensional approximations of their types, and a mechanism design component, responsible for robustifying mechanisms for the low-dimensional model to work for the approximate types of the former component. On the active learning front, we cast our problem in the framework of Randomized Linear Algebra (RLA) for regression problems, allowing us to import several breakthrough results from that line of research, and adapt them to our setting. On the mechanism design front, we remove many restrictive assumptions of prior work on the type of access needed to the underlying distributions and the associated mechanisms. To the best of our knowledge, our work is the first to formulate connections between mechanism design, and RLA for active learning of regression problems, opening the door for further applications of randomized linear algebra primitives to mechanism design.  ( 3 min )
    SEE-OoD: Supervised Exploration For Enhanced Out-of-Distribution Detection. (arXiv:2310.08040v1 [cs.LG])
    Current techniques for Out-of-Distribution (OoD) detection predominantly rely on quantifying predictive uncertainty and incorporating model regularization during the training phase, using either real or synthetic OoD samples. However, methods that utilize real OoD samples lack exploration and are prone to overfit the OoD samples at hand. Whereas synthetic samples are often generated based on features extracted from training data, rendering them less effective when the training and OoD data are highly overlapped in the feature space. In this work, we propose a Wasserstein-score-based generative adversarial training scheme to enhance OoD detection accuracy, which, for the first time, performs data augmentation and exploration simultaneously under the supervision of limited OoD samples. Specifically, the generator explores OoD spaces and generates synthetic OoD samples using feedback from the discriminator, while the discriminator exploits both the observed and synthesized samples for OoD detection using a predefined Wasserstein score. We provide theoretical guarantees that the optimal solutions of our generative scheme are statistically achievable through adversarial training in empirical settings. We then demonstrate that the proposed method outperforms state-of-the-art techniques on various computer vision datasets and exhibits superior generalizability to unseen OoD data.  ( 2 min )
    Open-Set Knowledge-Based Visual Question Answering with Inference Paths. (arXiv:2310.08148v1 [cs.LG])
    Given an image and an associated textual question, the purpose of Knowledge-Based Visual Question Answering (KB-VQA) is to provide a correct answer to the question with the aid of external knowledge bases. Prior KB-VQA models are usually formulated as a retriever-classifier framework, where a pre-trained retriever extracts textual or visual information from knowledge graphs and then makes a prediction among the candidates. Despite promising progress, there are two drawbacks with existing models. Firstly, modeling question-answering as multi-class classification limits the answer space to a preset corpus and lacks the ability of flexible reasoning. Secondly, the classifier merely consider "what is the answer" without "how to get the answer", which cannot ground the answer to explicit reasoning paths. In this paper, we confront the challenge of \emph{explainable open-set} KB-VQA, where the system is required to answer questions with entities at wild and retain an explainable reasoning path. To resolve the aforementioned issues, we propose a new retriever-ranker paradigm of KB-VQA, Graph pATH rankER (GATHER for brevity). Specifically, it contains graph constructing, pruning, and path-level ranking, which not only retrieves accurate answers but also provides inference paths that explain the reasoning process. To comprehensively evaluate our model, we reformulate the benchmark dataset OK-VQA with manually corrected entity-level annotations and release it as ConceptVQA. Extensive experiments on real-world questions demonstrate that our framework is not only able to perform open-set question answering across the whole knowledge base but provide explicit reasoning path.  ( 2 min )
    Variational Imbalanced Regression: Fair Uncertainty Quantification via Probabilistic Smoothing. (arXiv:2306.06599v4 [cs.LG] UPDATED)
    Existing regression models tend to fall short in both accuracy and uncertainty estimation when the label distribution is imbalanced. In this paper, we propose a probabilistic deep learning model, dubbed variational imbalanced regression (VIR), which not only performs well in imbalanced regression but naturally produces reasonable uncertainty estimation as a byproduct. Different from typical variational autoencoders assuming I.I.D. representations (a data point's representation is not directly affected by other data points), our VIR borrows data with similar regression labels to compute the latent representation's variational distribution; furthermore, different from deterministic regression models producing point estimates, VIR predicts the entire normal-inverse-gamma distributions and modulates the associated conjugate distributions to impose probabilistic reweighting on the imbalanced data, thereby providing better uncertainty estimation. Experiments in several real-world datasets show that our VIR can outperform state-of-the-art imbalanced regression models in terms of both accuracy and uncertainty estimation. Code will soon be available at \url{https://github.com/Wang-ML-Lab/variational-imbalanced-regression}.  ( 2 min )
    A Generic Software Framework for Distributed Topological Analysis Pipelines. (arXiv:2310.08339v1 [cs.DC])
    This system paper presents a software framework for the support of topological analysis pipelines in a distributed-memory model. While several recent papers introduced topology-based approaches for distributed-memory environments, these were reporting experiments obtained with tailored, mono-algorithm implementations. In contrast, we describe in this paper a general-purpose, generic framework for topological analysis pipelines, i.e. a sequence of topological algorithms interacting together, possibly on distinct numbers of processes. Specifically, we instantiated our framework with the MPI model, within the Topology ToolKit (TTK). While developing this framework, we faced several algorithmic and software engineering challenges, which we document in this paper. We provide a taxonomy for the distributed-memory topological algorithms supported by TTK, depending on their communication needs and provide examples of hybrid MPI+thread parallelizations. Detailed performance analyses show that parallel efficiencies range from $20\%$ to $80\%$ (depending on the algorithms), and that the MPI-specific preconditioning introduced by our framework induces a negligible computation time overhead. We illustrate the new distributed-memory capabilities of TTK with an example of advanced analysis pipeline, combining multiple algorithms, run on the largest publicly available dataset we have found (120 billion vertices) on a standard cluster with 64 nodes (for a total of 1,536 cores). Finally, we provide a roadmap for the completion of TTK's MPI extension, along with generic recommendations for each algorithm communication category.  ( 3 min )
    Trustworthy Machine Learning. (arXiv:2310.08215v1 [cs.LG])
    As machine learning technology gets applied to actual products and solutions, new challenges have emerged. Models unexpectedly fail to generalize to small changes in the distribution, tend to be confident on novel data they have never seen, or cannot communicate the rationale behind their decisions effectively with the end users. Collectively, we face a trustworthiness issue with the current machine learning technology. This textbook on Trustworthy Machine Learning (TML) covers a theoretical and technical background of four key topics in TML: Out-of-Distribution Generalization, Explainability, Uncertainty Quantification, and Evaluation of Trustworthiness. We discuss important classical and contemporary research papers of the aforementioned fields and uncover and connect their underlying intuitions. The book evolved from the homonymous course at the University of T\"ubingen, first offered in the Winter Semester of 2022/23. It is meant to be a stand-alone product accompanied by code snippets and various pointers to further sources on topics of TML. The dedicated website of the book is https://trustworthyml.io/.  ( 2 min )
    Towards a Unified Analysis of Kernel-based Methods Under Covariate Shift. (arXiv:2310.08237v1 [stat.ML])
    Covariate shift occurs prevalently in practice, where the input distributions of the source and target data are substantially different. Despite its practical importance in various learning problems, most of the existing methods only focus on some specific learning tasks and are not well validated theoretically and numerically. To tackle this problem, we propose a unified analysis of general nonparametric methods in a reproducing kernel Hilbert space (RKHS) under covariate shift. Our theoretical results are established for a general loss belonging to a rich loss function family, which includes many commonly used methods as special cases, such as mean regression, quantile regression, likelihood-based classification, and margin-based classification. Two types of covariate shift problems are the focus of this paper and the sharp convergence rates are established for a general loss function to provide a unified theoretical analysis, which concurs with the optimal results in literature where the squared loss is used. Extensive numerical studies on synthetic and real examples confirm our theoretical findings and further illustrate the effectiveness of our proposed method.  ( 2 min )
    Multi-View Variational Autoencoder for Missing Value Imputation in Untargeted Metabolomics. (arXiv:2310.07990v1 [q-bio.GN])
    Background: Missing data is a common challenge in mass spectrometry-based metabolomics, which can lead to biased and incomplete analyses. The integration of whole-genome sequencing (WGS) data with metabolomics data has emerged as a promising approach to enhance the accuracy of data imputation in metabolomics studies. Method: In this study, we propose a novel method that leverages the information from WGS data and reference metabolites to impute unknown metabolites. Our approach utilizes a multi-view variational autoencoder to jointly model the burden score, polygenetic risk score (PGS), and linkage disequilibrium (LD) pruned single nucleotide polymorphisms (SNPs) for feature extraction and missing metabolomics data imputation. By learning the latent representations of both omics data, our method can effectively impute missing metabolomics values based on genomic information. Results: We evaluate the performance of our method on empirical metabolomics datasets with missing values and demonstrate its superiority compared to conventional imputation techniques. Using 35 template metabolites derived burden scores, PGS and LD-pruned SNPs, the proposed methods achieved r2-scores > 0.01 for 71.55% of metabolites. Conclusion: The integration of WGS data in metabolomics imputation not only improves data completeness but also enhances downstream analyses, paving the way for more comprehensive and accurate investigations of metabolic pathways and disease associations. Our findings offer valuable insights into the potential benefits of utilizing WGS data for metabolomics data imputation and underscore the importance of leveraging multi-modal data integration in precision medicine research.  ( 3 min )
    Generative Modeling with Phase Stochastic Bridges. (arXiv:2310.07805v1 [cs.LG])
    Diffusion models (DMs) represent state-of-the-art generative models for continuous inputs. DMs work by constructing a Stochastic Differential Equation (SDE) in the input space (ie, position space), and using a neural network to reverse it. In this work, we introduce a novel generative modeling framework grounded in \textbf{phase space dynamics}, where a phase space is defined as {an augmented space encompassing both position and velocity.} Leveraging insights from Stochastic Optimal Control, we construct a path measure in the phase space that enables efficient sampling. {In contrast to DMs, our framework demonstrates the capability to generate realistic data points at an early stage of dynamics propagation.} This early prediction sets the stage for efficient data generation by leveraging additional velocity information along the trajectory. On standard image generation benchmarks, our model yields favorable performance over baselines in the regime of small Number of Function Evaluations (NFEs). Furthermore, our approach rivals the performance of diffusion models equipped with efficient sampling techniques, underscoring its potential as a new tool generative modeling.  ( 2 min )
    Optimizing Convolutional Neural Networks for Chronic Obstructive Pulmonary Disease Detection in Clinical Computed Tomography Imaging. (arXiv:2303.07189v3 [eess.IV] UPDATED)
    We aim to optimize the binary detection of Chronic Obstructive Pulmonary Disease (COPD) based on emphysema presence in the lung with convolutional neural networks (CNN) by exploring manually adjusted versus automated window-setting optimization (WSO) on computed tomography (CT) images. 7,194 CT images (3,597 with COPD; 3,597 healthy controls) from 78 subjects (43 with COPD; 35 healthy controls) were selected retrospectively (10.2018-12.2019) and preprocessed. For each image, intensity values were manually clipped to the emphysema window setting and a baseline 'full-range' window setting. Class-balanced train, validation, and test sets contained 3,392, 1,114, and 2,688 images. The network backbone was optimized by comparing various CNN architectures. Furthermore, automated WSO was implemented by adding a customized layer to the model. The image-level area under the Receiver Operating Characteristics curve (AUC) [lower, upper limit 95% confidence] was utilized to compare model variations. Repeated inference (n=7) on the test set showed that the DenseNet was the most efficient backbone and achieved a mean AUC of 0.80 [0.76, 0.85] without WSO. Comparably, with input images manually adjusted to the emphysema window, the DenseNet model predicted COPD with a mean AUC of 0.86 [0.82, 0.89]. By adding a customized WSO layer to the DenseNet, an optimal window in the proximity of the emphysema window setting was learned automatically, and a mean AUC of 0.82 [0.78, 0.86] was achieved. Detection of COPD with DenseNet models was improved by WSO of CT data to the emphysema window setting range.  ( 3 min )
    High-fidelity Pseudo-labels for Boosting Weakly-Supervised Segmentation. (arXiv:2304.02621v2 [cs.CV] UPDATED)
    Image-level weakly-supervised semantic segmentation (WSSS) reduces the usually vast data annotation cost by surrogate segmentation masks during training. The typical approach involves training an image classification network using global average pooling (GAP) on convolutional feature maps. This enables the estimation of object locations based on class activation maps (CAMs), which identify the importance of image regions. The CAMs are then used to generate pseudo-labels, in the form of segmentation masks, to supervise a segmentation model in the absence of pixel-level ground truth. Our work is based on two techniques for improving CAMs; importance sampling, which is a substitute for GAP, and the feature similarity loss, which utilizes a heuristic that object contours almost always align with color edges in images. However, both are based on the multinomial posterior with softmax, and implicitly assume that classes are mutually exclusive, which turns out suboptimal in our experiments. Thus, we reformulate both techniques based on binomial posteriors of multiple independent binary problems. This has two benefits; their performance is improved and they become more general, resulting in an add-on method that can boost virtually any WSSS method. This is demonstrated on a wide variety of baselines on the PASCAL VOC dataset, improving the region similarity and contour quality of all implemented state-of-the-art methods. Experiments on the MS COCO dataset show that our proposed add-on is well-suited for large-scale settings. Our code is available at https://github.com/arvijj/hfpl.  ( 3 min )
    Differentially-Private Decision Trees and Provable Robustness to Data Poisoning. (arXiv:2305.15394v2 [cs.LG] UPDATED)
    Decision trees are interpretable models that are well-suited to non-linear learning problems. Much work has been done on extending decision tree learning algorithms with differential privacy, a system that guarantees the privacy of samples within the training data. However, current state-of-the-art algorithms for this purpose sacrifice much utility for a small privacy benefit. These solutions create random decision nodes that reduce decision tree accuracy or spend an excessive share of the privacy budget on labeling leaves. Moreover, many works do not support continuous features or leak information about them. We propose a new method called PrivaTree based on private histograms that chooses good splits while consuming a small privacy budget. The resulting trees provide a significantly better privacy-utility trade-off and accept mixed numerical and categorical data without leaking information about numerical features. Finally, while it is notoriously hard to give robustness guarantees against data poisoning attacks, we demonstrate bounds for the expected accuracy and success rates of backdoor attacks against differentially-private learners. By leveraging the better privacy-utility trade-off of PrivaTree we are able to train decision trees with significantly better robustness against backdoor attacks compared to regular decision trees and with meaningful theoretical guarantees.  ( 2 min )
    A Symmetry-Aware Exploration of Bayesian Neural Network Posteriors. (arXiv:2310.08287v1 [stat.ML])
    The distribution of the weights of modern deep neural networks (DNNs) - crucial for uncertainty quantification and robustness - is an eminently complex object due to its extremely high dimensionality. This paper proposes one of the first large-scale explorations of the posterior distribution of deep Bayesian Neural Networks (BNNs), expanding its study to real-world vision tasks and architectures. Specifically, we investigate the optimal approach for approximating the posterior, analyze the connection between posterior quality and uncertainty quantification, delve into the impact of modes on the posterior, and explore methods for visualizing the posterior. Moreover, we uncover weight-space symmetries as a critical aspect for understanding the posterior. To this extent, we develop an in-depth assessment of the impact of both permutation and scaling symmetries that tend to obfuscate the Bayesian posterior. While the first type of transformation is known for duplicating modes, we explore the relationship between the latter and L2 regularization, challenging previous misconceptions. Finally, to help the community improve our understanding of the Bayesian posterior, we will shortly release the first large-scale checkpoint dataset, including thousands of real-world models and our codes.
    Learning Transferable Conceptual Prototypes for Interpretable Unsupervised Domain Adaptation. (arXiv:2310.08071v1 [cs.LG])
    Despite the great progress of unsupervised domain adaptation (UDA) with the deep neural networks, current UDA models are opaque and cannot provide promising explanations, limiting their applications in the scenarios that require safe and controllable model decisions. At present, a surge of work focuses on designing deep interpretable methods with adequate data annotations and only a few methods consider the distributional shift problem. Most existing interpretable UDA methods are post-hoc ones, which cannot facilitate the model learning process for performance enhancement. In this paper, we propose an inherently interpretable method, named Transferable Conceptual Prototype Learning (TCPL), which could simultaneously interpret and improve the processes of knowledge transfer and decision-making in UDA. To achieve this goal, we design a hierarchically prototypical module that transfers categorical basic concepts from the source domain to the target domain and learns domain-shared prototypes for explaining the underlying reasoning process. With the learned transferable prototypes, a self-predictive consistent pseudo-label strategy that fuses confidence, predictions, and prototype information, is designed for selecting suitable target samples for pseudo annotations and gradually narrowing down the domain gap. Comprehensive experiments show that the proposed method can not only provide effective and intuitive explanations but also outperform previous state-of-the-arts.  ( 2 min )
    Graph-SCP: Accelerating Set Cover Problems with Graph Neural Networks. (arXiv:2310.07979v1 [cs.LG])
    Machine learning (ML) approaches are increasingly being used to accelerate combinatorial optimization (CO) problems. We look specifically at the Set Cover Problem (SCP) and propose Graph-SCP, a graph neural network method that can augment existing optimization solvers by learning to identify a much smaller sub-problem that contains the solution space. We evaluate the performance of Graph-SCP on synthetic weighted and unweighted SCP instances with diverse problem characteristics and complexities, and on instances from the OR Library, a canonical benchmark for SCP. We show that Graph-SCP reduces the problem size by 30-70% and achieves run time speedups up to~25x when compared to commercial solvers (Gurobi). Given a desired optimality threshold, Graph-SCP will improve upon it or even achieve 100% optimality. This is in contrast to fast greedy solutions that significantly compromise solution quality to achieve guaranteed polynomial run time. Graph-SCP can generalize to larger problem sizes and can be used with other conventional or ML-augmented CO solvers to lead to potential additional run time improvement.  ( 2 min )
    Hyperparameter Adaptive Search for Surrogate Optimization: A Self-Adjusting Approach. (arXiv:2310.07970v1 [cs.LG])
    Surrogate Optimization (SO) algorithms have shown promise for optimizing expensive black-box functions. However, their performance is heavily influenced by hyperparameters related to sampling and surrogate fitting, which poses a challenge to their widespread adoption. We investigate the impact of hyperparameters on various SO algorithms and propose a Hyperparameter Adaptive Search for SO (HASSO) approach. HASSO is not a hyperparameter tuning algorithm, but a generic self-adjusting SO algorithm that dynamically tunes its own hyperparameters while concurrently optimizing the primary objective function, without requiring additional evaluations. The aim is to improve the accessibility, effectiveness, and convergence speed of SO algorithms for practitioners. Our approach identifies and modifies the most influential hyperparameters specific to each problem and SO approach, reducing the need for manual tuning without significantly increasing the computational burden. Experimental results demonstrate the effectiveness of HASSO in enhancing the performance of various SO algorithms across different global optimization test problems.  ( 2 min )
    Discerning Temporal Difference Learning. (arXiv:2310.08091v1 [cs.LG])
    Temporal difference learning (TD) is a foundational concept in reinforcement learning (RL), aimed at efficiently assessing a policy's value function. TD($\lambda$), a potent variant, incorporates a memory trace to distribute the prediction error into the historical context. However, this approach often neglects the significance of historical states and the relative importance of propagating the TD error, influenced by challenges such as visitation imbalance or outcome noise. To address this, we propose a novel TD algorithm named discerning TD learning (DTD), which allows flexible emphasis functions$-$predetermined or adapted during training$-$to allocate efforts effectively across states. We establish the convergence properties of our method within a specific class of emphasis functions and showcase its promising potential for adaptation to deep RL contexts. Empirical results underscore that employing a judicious emphasis function not only improves value estimation but also expedites learning across diverse scenarios.  ( 2 min )
    Unraveling the Single Tangent Space Fallacy: An Analysis and Clarification for Applying Riemannian Geometry in Robot Learning. (arXiv:2310.07902v1 [cs.RO])
    In the realm of robotics, numerous downstream robotics tasks leverage machine learning methods for processing, modeling, or synthesizing data. Often, this data comprises variables that inherently carry geometric constraints, such as the unit-norm condition of quaternions representing rigid-body orientations or the positive definiteness of stiffness and manipulability ellipsoids. Handling such geometric constraints effectively requires the incorporation of tools from differential geometry into the formulation of machine learning methods. In this context, Riemannian manifolds emerge as a powerful mathematical framework to handle such geometric constraints. Nevertheless, their recent adoption in robot learning has been largely characterized by a mathematically-flawed simplification, hereinafter referred to as the ``single tangent space fallacy". This approach involves merely projecting the data of interest onto a single tangent (Euclidean) space, over which an off-the-shelf learning algorithm is applied. This paper provides a theoretical elucidation of various misconceptions surrounding this approach and offers experimental evidence of its shortcomings. Finally, it presents valuable insights to promote best practices when employing Riemannian geometry within robot learning applications.  ( 2 min )
    LEMON: Lossless model expansion. (arXiv:2310.07999v1 [cs.LG])
    Scaling of deep neural networks, especially Transformers, is pivotal for their surging performance and has further led to the emergence of sophisticated reasoning capabilities in foundation models. Such scaling generally requires training large models from scratch with random initialization, failing to leverage the knowledge acquired by their smaller counterparts, which are already resource-intensive to obtain. To tackle this inefficiency, we present $\textbf{L}$ossl$\textbf{E}$ss $\textbf{MO}$del Expansio$\textbf{N}$ (LEMON), a recipe to initialize scaled models using the weights of their smaller but pre-trained counterparts. This is followed by model training with an optimized learning rate scheduler tailored explicitly for the scaled models, substantially reducing the training time compared to training from scratch. Notably, LEMON is versatile, ensuring compatibility with various network structures, including models like Vision Transformers and BERT. Our empirical results demonstrate that LEMON reduces computational costs by 56.7% for Vision Transformers and 33.2% for BERT when compared to training from scratch.  ( 2 min )
    TabLib: A Dataset of 627M Tables with Context. (arXiv:2310.07875v1 [cs.CL])
    It is well-established that large, diverse datasets play a pivotal role in the performance of modern AI systems for text and image modalities. However, there are no datasets for tabular data of comparable size and diversity to those available for text and images. Thus we present "TabLib'', a compilation of 627 million tables totaling 69 TiB, along with 867B tokens of context. TabLib was extracted from numerous file formats, including CSV, HTML, SQLite, PDF, Excel, and others, sourced from GitHub and Common Crawl. The size and diversity of TabLib offer considerable promise in the table modality, reminiscent of the original promise of foundational datasets for text and images, such as The Pile and LAION.  ( 2 min )
    Enhanced sampling of Crystal Nucleation with Graph Representation Learnt Variables. (arXiv:2310.07927v1 [cond-mat.stat-mech])
    In this study, we present a graph neural network-based learning approach using an autoencoder setup to derive low-dimensional variables from features observed in experimental crystal structures. These variables are then biased in enhanced sampling to observe state-to-state transitions and reliable thermodynamic weights. Our approach uses simple convolution and pooling methods. To verify the effectiveness of our protocol, we examined the nucleation of various allotropes and polymorphs of iron and glycine from their molten states. Our graph latent variables when biased in well-tempered metadynamics consistently show transitions between states and achieve accurate free energy calculations in agreement with experiments, both of which are indicators of dependable sampling. This underscores the strength and promise of our graph neural net variables for improved sampling. The protocol shown here should be applicable for other systems and with other sampling methods.  ( 2 min )
    Online RL in Linearly $q^\pi$-Realizable MDPs Is as Easy as in Linear MDPs If You Learn What to Ignore. (arXiv:2310.07811v1 [cs.LG])
    We consider online reinforcement learning (RL) in episodic Markov decision processes (MDPs) under the linear $q^\pi$-realizability assumption, where it is assumed that the action-values of all policies can be expressed as linear functions of state-action features. This class is known to be more general than linear MDPs, where the transition kernel and the reward function are assumed to be linear functions of the feature vectors. As our first contribution, we show that the difference between the two classes is the presence of states in linearly $q^\pi$-realizable MDPs where for any policy, all the actions have approximately equal values, and skipping over these states by following an arbitrarily fixed policy in those states transforms the problem to a linear MDP. Based on this observation, we derive a novel (computationally inefficient) learning algorithm for linearly $q^\pi$-realizable MDPs that simultaneously learns what states should be skipped over and runs another learning algorithm on the linear MDP hidden in the problem. The method returns an $\epsilon$-optimal policy after $\text{polylog}(H, d)/\epsilon^2$ interactions with the MDP, where $H$ is the time horizon and $d$ is the dimension of the feature vectors, giving the first polynomial-sample-complexity online RL algorithm for this setting. The results are proved for the misspecified case, where the sample complexity is shown to degrade gracefully with the misspecification error.  ( 3 min )
    Large Language Models Are Zero-Shot Time Series Forecasters. (arXiv:2310.07820v1 [cs.LG])
    By encoding time series as a string of numerical digits, we can frame time series forecasting as next-token prediction in text. Developing this approach, we find that large language models (LLMs) such as GPT-3 and LLaMA-2 can surprisingly zero-shot extrapolate time series at a level comparable to or exceeding the performance of purpose-built time series models trained on the downstream tasks. To facilitate this performance, we propose procedures for effectively tokenizing time series data and converting discrete distributions over tokens into highly flexible densities over continuous values. We argue the success of LLMs for time series stems from their ability to naturally represent multimodal distributions, in conjunction with biases for simplicity, and repetition, which align with the salient features in many time series, such as repeated seasonal trends. We also show how LLMs can naturally handle missing data without imputation through non-numerical text, accommodate textual side information, and answer questions to help explain predictions. While we find that increasing model size generally improves performance on time series, we show GPT-4 can perform worse than GPT-3 because of how it tokenizes numbers, and poor uncertainty calibration, which is likely the result of alignment interventions such as RLHF.  ( 2 min )
    ASV Station Keeping under Wind Disturbances using Neural Network Simulation Error Minimization Model Predictive Control. (arXiv:2310.07892v1 [cs.RO])
    Station keeping is an essential maneuver for Autonomous Surface Vehicles (ASVs), mainly when used in confined spaces, to carry out surveys that require the ASV to keep its position or in collaboration with other vehicles where the relative position has an impact over the mission. However, this maneuver can become challenging for classic feedback controllers due to the need for an accurate model of the ASV dynamics and the environmental disturbances. This work proposes a Model Predictive Controller using Neural Network Simulation Error Minimization (NNSEM-MPC) to accurately predict the dynamics of the ASV under wind disturbances. The performance of the proposed scheme under wind disturbances is tested and compared against other controllers in simulation, using the Robotics Operating System (ROS) and the multipurpose simulation environment Gazebo. A set of six tests were conducted by combining two wind speeds (3 m/s and 6 m/s) and three wind directions (0$^\circ$, 90$^\circ$, and 180$^\circ$). The simulation results clearly show the advantage of the NNSEM-MPC over the following methods: backstepping controller, sliding mode controller, simplified dynamics MPC (SD-MPC), neural ordinary differential equation MPC (NODE-MPC), and knowledge-based NODE MPC (KNODE-MPC). The proposed NNSEM-MPC approach performs better than the rest in 4 out of the 6 test conditions, and it is the second best in the 2 remaining test cases, reducing the mean position and heading error by at least 31\% and 46\% respectively across all the test cases. In terms of execution speed, the proposed NNSEM-MPC is at least 36\% faster than the rest of the MPC controllers. The field experiments on two different ASV platforms showed that ASVs can effectively keep the station utilizing the proposed method, with a position error as low as $1.68$ m and a heading error as low as $6.14^{\circ}$ within time windows of at least $150$s.  ( 3 min )
    NoMaD: Goal Masked Diffusion Policies for Navigation and Exploration. (arXiv:2310.07896v1 [cs.RO])
    Robotic learning for navigation in unfamiliar environments needs to provide policies for both task-oriented navigation (i.e., reaching a goal that the robot has located), and task-agnostic exploration (i.e., searching for a goal in a novel setting). Typically, these roles are handled by separate models, for example by using subgoal proposals, planning, or separate navigation strategies. In this paper, we describe how we can train a single unified diffusion policy to handle both goal-directed navigation and goal-agnostic exploration, with the latter providing the ability to search novel environments, and the former providing the ability to reach a user-specified goal once it has been located. We show that this unified policy results in better overall performance when navigating to visually indicated goals in novel environments, as compared to approaches that use subgoal proposals from generative models, or prior methods based on latent variable models. We instantiate our method by using a large-scale Transformer-based policy trained on data from multiple ground robots, with a diffusion model decoder to flexibly handle both goal-conditioned and goal-agnostic navigation. Our experiments, conducted on a real-world mobile robot platform, show effective navigation in unseen environments in comparison with five alternative methods, and demonstrate significant improvements in performance and lower collision rates, despite utilizing smaller models than state-of-the-art approaches. For more videos, code, and pre-trained model checkpoints, see https://general-navigation-models.github.io/nomad/  ( 2 min )
    RandCom: Random Communication Skipping Method for Decentralized Stochastic Optimization. (arXiv:2310.07983v1 [cs.LG])
    Distributed optimization methods with random communication skips are gaining increasing attention due to their proven benefits in accelerating communication complexity. Nevertheless, existing research mainly focuses on centralized communication protocols for strongly convex deterministic settings. In this work, we provide a decentralized optimization method called RandCom, which incorporates probabilistic local updates. We analyze the performance of RandCom in stochastic non-convex, convex, and strongly convex settings and demonstrate its ability to asymptotically reduce communication overhead by the probability of communication. Additionally, we prove that RandCom achieves linear speedup as the number of nodes increases. In stochastic strongly convex settings, we further prove that RandCom can achieve linear speedup with network-independent stepsizes. Moreover, we apply RandCom to federated learning and provide positive results concerning the potential for achieving linear speedup and the suitability of the probabilistic local update approach for non-convex settings.  ( 2 min )
    A Review of Machine Learning Techniques in Imbalanced Data and Future Trends. (arXiv:2310.07917v1 [cs.LG])
    For over two decades, detecting rare events has been a challenging task among researchers in the data mining and machine learning domain. Real-life problems inspire researchers to navigate and further improve data processing and algorithmic approaches to achieve effective and computationally efficient methods for imbalanced learning. In this paper, we have collected and reviewed 258 peer-reviewed papers from archival journals and conference papers in an attempt to provide an in-depth review of various approaches in imbalanced learning from technical and application perspectives. This work aims to provide a structured review of methods used to address the problem of imbalanced data in various domains and create a general guideline for researchers in academia or industry who want to dive into the broad field of machine learning using large-scale imbalanced data.  ( 2 min )
    QArchSearch: A Scalable Quantum Architecture Search Package. (arXiv:2310.07858v1 [quant-ph])
    The current era of quantum computing has yielded several algorithms that promise high computational efficiency. While the algorithms are sound in theory and can provide potentially exponential speedup, there is little guidance on how to design proper quantum circuits to realize the appropriate unitary transformation to be applied to the input quantum state. In this paper, we present \texttt{QArchSearch}, an AI based quantum architecture search package with the \texttt{QTensor} library as a backend that provides a principled and automated approach to finding the best model given a task and input quantum state. We show that the search package is able to efficiently scale the search to large quantum circuits and enables the exploration of more complex models for different quantum applications. \texttt{QArchSearch} runs at scale and high efficiency on high-performance computing systems using a two-level parallelization scheme on both CPUs and GPUs, which has been demonstrated on the Polaris supercomputer.  ( 2 min )
    When, Why and How Much? Adaptive Learning Rate Scheduling by Refinement. (arXiv:2310.07831v1 [cs.LG])
    Learning rate schedules used in practice bear little resemblance to those recommended by theory. We close much of this theory/practice gap, and as a consequence are able to derive new problem-adaptive learning rate schedules. Our key technical contribution is a refined analysis of learning rate schedules for a wide class of optimization algorithms (including SGD). In contrast to most prior works that study the convergence of the average iterate, we study the last iterate, which is what most people use in practice. When considering only worst-case analysis, our theory predicts that the best choice is the linear decay schedule: a popular choice in practice that sets the stepsize proportionally to $1 - t/T$, where $t$ is the current iteration and $T$ is the total number of steps. To go beyond this worst-case analysis, we use the observed gradient norms to derive schedules refined for any particular task. These refined schedules exhibit learning rate warm-up and rapid learning rate annealing near the end of training. Ours is the first systematic approach to automatically yield both of these properties. We perform the most comprehensive evaluation of learning rate schedules to date, evaluating across 10 diverse deep learning problems, a series of LLMs, and a suite of logistic regression problems. We validate that overall, the linear-decay schedule matches or outperforms all commonly used default schedules including cosine annealing, and that our schedule refinement method gives further improvements.  ( 3 min )
    Feature Learning and Generalization in Deep Networks with Orthogonal Weights. (arXiv:2310.07765v1 [cs.LG])
    Fully-connected deep neural networks with weights initialized from independent Gaussian distributions can be tuned to criticality, which prevents the exponential growth or decay of signals propagating through the network. However, such networks still exhibit fluctuations that grow linearly with the depth of the network, which may impair the training of networks with width comparable to depth. We show analytically that rectangular networks with tanh activations and weights initialized from the ensemble of orthogonal matrices have corresponding preactivation fluctuations which are independent of depth, to leading order in inverse width. Moreover, we demonstrate numerically that, at initialization, all correlators involving the neural tangent kernel (NTK) and its descendants at leading order in inverse width -- which govern the evolution of observables during training -- saturate at a depth of $\sim 20$, rather than growing without bound as in the case of Gaussian initializations. We speculate that this structure preserves finite-width feature learning while reducing overall noise, thus improving both generalization and training speed. We provide some experimental justification by relating empirical measurements of the NTK to the superior performance of deep nonlinear orthogonal networks trained under full-batch gradient descent on the MNIST and CIFAR-10 classification tasks.  ( 2 min )
    Faithfulness Measurable Masked Language Models. (arXiv:2310.07819v1 [cs.CL])
    A common approach to explain NLP models, is to use importance measures that express which tokens are important for a prediction. Unfortunately, such explanations are often wrong despite being persuasive. Therefore, it is essential to measure their faithfulness. One such metric is if tokens are truly important, then masking them should result in worse model performance. However, token masking introduces out-of-distribution issues and existing solutions are computationally expensive and employ proxy-models. Furthermore, other metrics are very limited in scope. In this work, we propose an inherently faithfulness measurable model that addresses these challenges. This is achieved by using a novel fine-tuning method that incorporates masking, such that masking tokens become in-distribution by design. This differs from existing approaches, which are completely model-agnostic but are inapplicable in practice. We demonstrate the generality of our approach by applying it to various tasks and validate it using statistical in-distribution tests. Additionally, because masking is in-distribution, importance measures which themselves use masking become more faithful, thus our model becomes more explainable.  ( 2 min )
    Using Spark Machine Learning Models to Perform Predictive Analysis on Flight Ticket Pricing Data. (arXiv:2310.07787v1 [cs.LG])
    This paper discusses predictive performance and processes undertaken on flight pricing data utilizing r2(r-square) and RMSE that leverages a large dataset, originally from Expedia.com, consisting of approximately 20 million records or 4.68 gigabytes. The project aims to determine the best models usable in the real world to predict airline ticket fares for non-stop flights across the US. Therefore, good generalization capability and optimized processing times are important measures for the model. We will discover key business insights utilizing feature importance and discuss the process and tools used for our analysis. Four regression machine learning algorithms were utilized: Random Forest, Gradient Boost Tree, Decision Tree, and Factorization Machines utilizing Cross Validator and Training Validator functions for assessing performance and generalization capability.  ( 2 min )
    Self-supervised Representation Learning From Random Data Projectors. (arXiv:2310.07756v1 [cs.LG])
    Self-supervised representation learning~(SSRL) has advanced considerably by exploiting the transformation invariance assumption under artificially designed data augmentations. While augmentation-based SSRL algorithms push the boundaries of performance in computer vision and natural language processing, they are often not directly applicable to other data modalities, and can conflict with application-specific data augmentation constraints. This paper presents an SSRL approach that can be applied to any data modality and network architecture because it does not rely on augmentations or masking. Specifically, we show that high-quality data representations can be learned by reconstructing random data projections. We evaluate the proposed approach on a wide range of representation learning tasks that span diverse modalities and real-world applications. We show that it outperforms multiple state-of-the-art SSRL baselines. Due to its wide applicability and strong empirical results, we argue that learning from randomness is a fruitful research direction worthy of attention and further study.  ( 2 min )
    Parametric Leaky Tanh: A New Hybrid Activation Function for Deep Learning. (arXiv:2310.07720v1 [cs.LG])
    Activation functions (AFs) are crucial components of deep neural networks (DNNs), having a significant impact on their performance. An activation function in a DNN is typically a smooth, nonlinear function that transforms an input signal into an output signal for the subsequent layer. In this paper, we propose the Parametric Leaky Tanh (PLTanh), a novel hybrid activation function designed to combine the strengths of both the Tanh and Leaky ReLU (LReLU) activation functions. PLTanh is differentiable at all points and addresses the 'dying ReLU' problem by ensuring a non-zero gradient for negative inputs, consistent with the behavior of LReLU. By integrating the unique advantages of these two diverse activation functions, PLTanh facilitates the learning of more intricate nonlinear relationships within the network. This paper presents an empirical evaluation of PLTanh against established activation functions, namely ReLU, LReLU, and ALReLU utilizing five diverse datasets.  ( 2 min )
    Visual Forecasting as a Mid-level Representation for Avoidance. (arXiv:2310.07724v1 [cs.RO])
    The challenge of navigation in environments with dynamic objects continues to be a central issue in the study of autonomous agents. While predictive methods hold promise, their reliance on precise state information makes them less practical for real-world implementation. This study presents visual forecasting as an innovative alternative. By introducing intuitive visual cues, this approach projects the future trajectories of dynamic objects to improve agent perception and enable anticipatory actions. Our research explores two distinct strategies for conveying predictive information through visual forecasting: (1) sequences of bounding boxes, and (2) augmented paths. To validate the proposed visual forecasting strategies, we initiate evaluations in simulated environments using the Unity engine and then extend these evaluations to real-world scenarios to assess both practicality and effectiveness. The results confirm the viability of visual forecasting as a promising solution for navigation and obstacle avoidance in dynamic environments.  ( 2 min )
  • Open

    Feature Learning and Generalization in Deep Networks with Orthogonal Weights. (arXiv:2310.07765v1 [cs.LG])
    Fully-connected deep neural networks with weights initialized from independent Gaussian distributions can be tuned to criticality, which prevents the exponential growth or decay of signals propagating through the network. However, such networks still exhibit fluctuations that grow linearly with the depth of the network, which may impair the training of networks with width comparable to depth. We show analytically that rectangular networks with tanh activations and weights initialized from the ensemble of orthogonal matrices have corresponding preactivation fluctuations which are independent of depth, to leading order in inverse width. Moreover, we demonstrate numerically that, at initialization, all correlators involving the neural tangent kernel (NTK) and its descendants at leading order in inverse width -- which govern the evolution of observables during training -- saturate at a depth of $\sim 20$, rather than growing without bound as in the case of Gaussian initializations. We speculate that this structure preserves finite-width feature learning while reducing overall noise, thus improving both generalization and training speed. We provide some experimental justification by relating empirical measurements of the NTK to the superior performance of deep nonlinear orthogonal networks trained under full-batch gradient descent on the MNIST and CIFAR-10 classification tasks.  ( 2 min )
    LEMON: Lossless model expansion. (arXiv:2310.07999v1 [cs.LG])
    Scaling of deep neural networks, especially Transformers, is pivotal for their surging performance and has further led to the emergence of sophisticated reasoning capabilities in foundation models. Such scaling generally requires training large models from scratch with random initialization, failing to leverage the knowledge acquired by their smaller counterparts, which are already resource-intensive to obtain. To tackle this inefficiency, we present $\textbf{L}$ossl$\textbf{E}$ss $\textbf{MO}$del Expansio$\textbf{N}$ (LEMON), a recipe to initialize scaled models using the weights of their smaller but pre-trained counterparts. This is followed by model training with an optimized learning rate scheduler tailored explicitly for the scaled models, substantially reducing the training time compared to training from scratch. Notably, LEMON is versatile, ensuring compatibility with various network structures, including models like Vision Transformers and BERT. Our empirical results demonstrate that LEMON reduces computational costs by 56.7% for Vision Transformers and 33.2% for BERT when compared to training from scratch.
    $L^1$ Estimation: On the Optimality of Linear Estimators. (arXiv:2309.09129v2 [math.ST] UPDATED)
    Consider the problem of estimating a random variable $X$ from noisy observations $Y = X+ Z$, where $Z$ is standard normal, under the $L^1$ fidelity criterion. It is well known that the optimal Bayesian estimator in this setting is the conditional median. This work shows that the only prior distribution on $X$ that induces linearity in the conditional median is Gaussian. Along the way, several other results are presented. In particular, it is demonstrated that if the conditional distribution $P_{X|Y=y}$ is symmetric for all $y$, then $X$ must follow a Gaussian distribution. Additionally, we consider other $L^p$ losses and observe the following phenomenon: for $p \in [1,2]$, Gaussian is the only prior distribution that induces a linear optimal Bayesian estimator, and for $p \in (2,\infty)$, infinitely many prior distributions on $X$ can induce linearity. Finally, extensions are provided to encompass noise models leading to conditional distributions from certain exponential families.
    A Symmetry-Aware Exploration of Bayesian Neural Network Posteriors. (arXiv:2310.08287v1 [stat.ML])
    The distribution of the weights of modern deep neural networks (DNNs) - crucial for uncertainty quantification and robustness - is an eminently complex object due to its extremely high dimensionality. This paper proposes one of the first large-scale explorations of the posterior distribution of deep Bayesian Neural Networks (BNNs), expanding its study to real-world vision tasks and architectures. Specifically, we investigate the optimal approach for approximating the posterior, analyze the connection between posterior quality and uncertainty quantification, delve into the impact of modes on the posterior, and explore methods for visualizing the posterior. Moreover, we uncover weight-space symmetries as a critical aspect for understanding the posterior. To this extent, we develop an in-depth assessment of the impact of both permutation and scaling symmetries that tend to obfuscate the Bayesian posterior. While the first type of transformation is known for duplicating modes, we explore the relationship between the latter and L2 regularization, challenging previous misconceptions. Finally, to help the community improve our understanding of the Bayesian posterior, we will shortly release the first large-scale checkpoint dataset, including thousands of real-world models and our codes.
    A general framework for multi-step ahead adaptive conformal heteroscedastic time series forecasting. (arXiv:2207.14219v9 [stat.ML] UPDATED)
    This paper introduces a novel model-agnostic algorithm called adaptive ensemble batch multi-input multi-output conformalized quantile regression (AEnbMIMOCQR} that enables forecasters to generate multi-step ahead prediction intervals for a fixed pre-specified miscoverage rate in a distribution-free manner. Our method is grounded on conformal prediction principles, however, it does not require data splitting and provides close to exact coverage even when the data is not exchangeable. Moreover, the resulting prediction intervals, besides being empirically valid along the forecast horizon, do not neglect heteroscedasticity. AEnbMIMOCQR is designed to be robust to distribution shifts, which means that its prediction intervals remain reliable over an unlimited period of time, without entailing retraining or imposing unrealistic strict assumptions on the data-generating process. Through methodically experimentation, we demonstrate that our approach outperforms other competitive methods on both real-world and synthetic datasets. The code used in the experimental part and a tutorial on how to use AEnbMIMOCQR can be found at the following GitHub repository: https://github.com/Quilograma/AEnbMIMOCQR.
    RandCom: Random Communication Skipping Method for Decentralized Stochastic Optimization. (arXiv:2310.07983v1 [cs.LG])
    Distributed optimization methods with random communication skips are gaining increasing attention due to their proven benefits in accelerating communication complexity. Nevertheless, existing research mainly focuses on centralized communication protocols for strongly convex deterministic settings. In this work, we provide a decentralized optimization method called RandCom, which incorporates probabilistic local updates. We analyze the performance of RandCom in stochastic non-convex, convex, and strongly convex settings and demonstrate its ability to asymptotically reduce communication overhead by the probability of communication. Additionally, we prove that RandCom achieves linear speedup as the number of nodes increases. In stochastic strongly convex settings, we further prove that RandCom can achieve linear speedup with network-independent stepsizes. Moreover, we apply RandCom to federated learning and provide positive results concerning the potential for achieving linear speedup and the suitability of the probabilistic local update approach for non-convex settings.  ( 2 min )
    Characterizing climate pathways using feature importance on echo state networks. (arXiv:2310.08495v1 [stat.ML])
    The 2022 National Defense Strategy of the United States listed climate change as a serious threat to national security. Climate intervention methods, such as stratospheric aerosol injection, have been proposed as mitigation strategies, but the downstream effects of such actions on a complex climate system are not well understood. The development of algorithmic techniques for quantifying relationships between source and impact variables related to a climate event (i.e., a climate pathway) would help inform policy decisions. Data-driven deep learning models have become powerful tools for modeling highly nonlinear relationships and may provide a route to characterize climate variable relationships. In this paper, we explore the use of an echo state network (ESN) for characterizing climate pathways. ESNs are a computationally efficient neural network variation designed for temporal data, and recent work proposes ESNs as a useful tool for forecasting spatio-temporal climate data. Like other neural networks, ESNs are non-interpretable black-box models, which poses a hurdle for understanding variable relationships. We address this issue by developing feature importance methods for ESNs in the context of spatio-temporal data to quantify variable relationships captured by the model. We conduct a simulation study to assess and compare the feature importance techniques, and we demonstrate the approach on reanalysis climate data. In the climate application, we select a time period that includes the 1991 volcanic eruption of Mount Pinatubo. This event was a significant stratospheric aerosol injection, which we use as a proxy for an artificial stratospheric aerosol injection. Using the proposed approach, we are able to characterize relationships between pathway variables associated with this event.  ( 3 min )
    Impact of multi-armed bandit strategies on deep recurrent reinforcement learning. (arXiv:2310.08331v1 [stat.ML])
    Incomplete knowledge of the environment leads an agent to make decisions under uncertainty. One of the major dilemmas in Reinforcement Learning (RL) where an autonomous agent has to balance two contrasting needs in making its decisions is: exploiting the current knowledge of the environment to maximize the cumulative reward as well as exploring actions that allow improving the knowledge of the environment, hopefully leading to higher reward values (exploration-exploitation trade-off). Concurrently, another relevant issue regards the full observability of the states, which may not be assumed in all applications. Such as when only 2D images are considered as input in a RL approach used for finding the optimal action within a 3D simulation environment. In this work, we address these issues by deploying and testing several techniques to balance exploration and exploitation trade-off on partially observable systems for predicting steering wheels in autonomous driving scenario. More precisely, the final aim is to investigate the effects of using both stochastic and deterministic multi-armed bandit strategies coupled with a Deep Recurrent Q-Network. Additionally, we adapted and evaluated the impact of an innovative method to improve the learning phase of the underlying Convolutional Recurrent Neural Network. We aim to show that adaptive stochastic methods for exploration better approximate the trade-off between exploration and exploitation as, in general, Softmax and Max-Boltzmann strategies are able to outperform epsilon-greedy techniques.  ( 2 min )
    Clustering Three-Way Data with Outliers. (arXiv:2310.05288v2 [stat.ML] UPDATED)
    Matrix-variate distributions are a recent addition to the model-based clustering field, thereby making it possible to analyze data in matrix form with complex structure such as images and time series. Due to its recent appearance, there is limited literature on matrix-variate data, with even less on dealing with outliers in these models. An approach for clustering matrix-variate normal data with outliers is discussed. The approach, which uses the distribution of subset log-likelihoods, extends the OCLUST algorithm to matrix-variate normal data and uses an iterative approach to detect and trim outliers.  ( 2 min )
    Contextualized Policy Recovery: Modeling and Interpreting Medical Decisions with Adaptive Imitation Learning. (arXiv:2310.07918v1 [cs.LG])
    Interpretable policy learning seeks to estimate intelligible decision policies from observed actions; however, existing models fall short by forcing a tradeoff between accuracy and interpretability. This tradeoff limits data-driven interpretations of human decision-making process. e.g. to audit medical decisions for biases and suboptimal practices, we require models of decision processes which provide concise descriptions of complex behaviors. Fundamentally, existing approaches are burdened by this tradeoff because they represent the underlying decision process as a universal policy, when in fact human decisions are dynamic and can change drastically with contextual information. Thus, we propose Contextualized Policy Recovery (CPR), which re-frames the problem of modeling complex decision processes as a multi-task learning problem in which complex decision policies are comprised of context-specific policies. CPR models each context-specific policy as a linear observation-to-action mapping, and generates new decision models $\textit{on-demand}$ as contexts are updated with new observations. CPR is compatible with fully offline and partially observable decision environments, and can be tailored to incorporate any recurrent black-box model or interpretable decision model. We assess CPR through studies on simulated and real data, achieving state-of-the-art performance on the canonical tasks of predicting antibiotic prescription in intensive care units ($+22\%$ AUROC vs. previous SOTA) and predicting MRI prescription for Alzheimer's patients ($+7.7\%$ AUROC vs. previous SOTA). With this improvement in predictive performance, CPR closes the accuracy gap between interpretable and black-box methods for policy learning, allowing high-resolution exploration and analysis of context-specific decision models.  ( 3 min )
    Understanding Sparse Feature Updates in Deep Networks using Iterative Linearisation. (arXiv:2211.12345v4 [cs.LG] UPDATED)
    Larger and deeper networks generalise well despite their increased capacity to overfit. Understanding why this happens is theoretically and practically important. One recent approach looks at the infinitely wide limits of such networks and their corresponding kernels. However, these theoretical tools cannot fully explain finite networks as the empirical kernel changes significantly during gradient-descent-based training in contrast to infinite networks. In this work, we derive an iterative linearised training method as a novel empirical tool to further investigate this distinction, allowing us to control for sparse (i.e. infrequent) feature updates and quantify the frequency of feature learning needed to achieve comparable performance. We justify iterative linearisation as an interpolation between a finite analog of the infinite width regime, which does not learn features, and standard gradient descent training, which does. Informally, we also show that it is analogous to a damped version of the Gauss-Newton algorithm -- a second-order method. We show that in a variety of cases, iterative linearised training surprisingly performs on par with standard training, noting in particular how much less frequent feature learning is required to achieve comparable performance. We also show that feature learning is essential for good performance. Since such feature learning inevitably causes changes in the NTK kernel, we provide direct negative evidence for the NTK theory, which states the NTK kernel remains constant during training.  ( 3 min )
    Limits of Model Selection under Transfer Learning. (arXiv:2305.00152v4 [stat.ML] UPDATED)
    Theoretical studies on transfer learning or domain adaptation have so far focused on situations with a known hypothesis class or model; however in practice, some amount of model selection is usually involved, often appearing under the umbrella term of hyperparameter-tuning: for example, one may think of the problem of tuning for the right neural network architecture towards a target task, while leveraging data from a related source task. Now, in addition to the usual tradeoffs on approximation vs estimation errors involved in model selection, this problem brings in a new complexity term, namely, the transfer distance between source and target distributions, which is known to vary with the choice of hypothesis class. We present a first study of this problem, focusing on classification; in particular, the analysis reveals some remarkable phenomena: adaptive rates, i.e., those achievable with no distributional information, can be arbitrarily slower than oracle rates, i.e., when given knowledge on distances.  ( 2 min )
    A Theory of Non-Linear Feature Learning with One Gradient Step in Two-Layer Neural Networks. (arXiv:2310.07891v1 [stat.ML])
    Feature learning is thought to be one of the fundamental reasons for the success of deep neural networks. It is rigorously known that in two-layer fully-connected neural networks under certain conditions, one step of gradient descent on the first layer followed by ridge regression on the second layer can lead to feature learning; characterized by the appearance of a separated rank-one component -- spike -- in the spectrum of the feature matrix. However, with a constant gradient descent step size, this spike only carries information from the linear component of the target function and therefore learning non-linear components is impossible. We show that with a learning rate that grows with the sample size, such training in fact introduces multiple rank-one components, each corresponding to a specific polynomial feature. We further prove that the limiting large-dimensional and large sample training and test errors of the updated neural networks are fully characterized by these spikes. By precisely analyzing the improvement in the loss, we demonstrate that these non-linear features can enhance learning.  ( 2 min )
    Log-Gaussian Gamma Processes for Training Bayesian Neural Networks in Raman and CARS Spectroscopies. (arXiv:2310.08055v1 [stat.AP])
    We propose an approach utilizing gamma-distributed random variables, coupled with log-Gaussian modeling, to generate synthetic datasets suitable for training neural networks. This addresses the challenge of limited real observations in various applications. We apply this methodology to both Raman and coherent anti-Stokes Raman scattering (CARS) spectra, using experimental spectra to estimate gamma process parameters. Parameter estimation is performed using Markov chain Monte Carlo methods, yielding a full Bayesian posterior distribution for the model which can be sampled for synthetic data generation. Additionally, we model the additive and multiplicative background functions for Raman and CARS with Gaussian processes. We train two Bayesian neural networks to estimate parameters of the gamma process which can then be used to estimate the underlying Raman spectrum and simultaneously provide uncertainty through the estimation of parameters of a probability distribution. We apply the trained Bayesian neural networks to experimental Raman spectra of phthalocyanine blue, aniline black, naphthol red, and red 264 pigments and also to experimental CARS spectra of adenosine phosphate, fructose, glucose, and sucrose. The results agree with deterministic point estimates for the underlying Raman and CARS spectral signatures.  ( 2 min )
    Learning to Act from Actionless Videos through Dense Correspondences. (arXiv:2310.08576v1 [cs.RO])
    In this work, we present an approach to construct a video-based robot policy capable of reliably executing diverse tasks across different robots and environments from few video demonstrations without using any action annotations. Our method leverages images as a task-agnostic representation, encoding both the state and action information, and text as a general representation for specifying robot goals. By synthesizing videos that ``hallucinate'' robot executing actions and in combination with dense correspondences between frames, our approach can infer the closed-formed action to execute to an environment without the need of any explicit action labels. This unique capability allows us to train the policy solely based on RGB videos and deploy learned policies to various robotic tasks. We demonstrate the efficacy of our approach in learning policies on table-top manipulation and navigation tasks. Additionally, we contribute an open-source framework for efficient video modeling, enabling the training of high-fidelity policy models with four GPUs within a single day.  ( 2 min )
    A Complete Recipe for Diffusion Generative Models. (arXiv:2303.01748v2 [cs.LG] UPDATED)
    Score-based Generative Models (SGMs) have demonstrated exceptional synthesis outcomes across various tasks. However, the current design landscape of the forward diffusion process remains largely untapped and often relies on physical heuristics or simplifying assumptions. Utilizing insights from the development of scalable Bayesian posterior samplers, we present a complete recipe for formulating forward processes in SGMs, ensuring convergence to the desired target distribution. Our approach reveals that several existing SGMs can be seen as specific manifestations of our framework. Building upon this method, we introduce Phase Space Langevin Diffusion (PSLD), which relies on score-based modeling within an augmented space enriched by auxiliary variables akin to physical phase space. Empirical results exhibit the superior sample quality and improved speed-quality trade-off of PSLD compared to various competing approaches on established image synthesis benchmarks. Remarkably, PSLD achieves sample quality akin to state-of-the-art SGMs (FID: 2.10 for unconditional CIFAR-10 generation). Lastly, we demonstrate the applicability of PSLD in conditional synthesis using pre-trained score networks, offering an appealing alternative as an SGM backbone for future advancements. Code and model checkpoints can be accessed at \url{https://github.com/mandt-lab/PSLD}.  ( 2 min )
    Smoothed $f$-Divergence Distributionally Robust Optimization. (arXiv:2306.14041v2 [math.OC] UPDATED)
    In data-driven optimization, sample average approximation (SAA) is known to suffer from the so-called optimizer's curse that causes an over-optimistic evaluation of the solution performance. We argue that a special type of distributionallly robust optimization (DRO) formulation offers theoretical advantages in correcting for this optimizer's curse compared to simple ``margin'' adjustments to SAA and other DRO approaches: It attains a statistical bound on the out-of-sample performance, for a wide class of objective functions and distributions, that is nearly tightest in terms of exponential decay rate. This DRO uses an ambiguity set based on a Kullback Leibler (KL) divergence smoothed by the Wasserstein or L\'evy-Prokhorov (LP) distance via a suitable distance optimization. Computationally, we also show that such a DRO, and its generalized versions using smoothed $f$-divergence, are not harder than DRO problems based on $f$-divergence or Wasserstein distances, rendering our DRO formulations both statistically optimal and computationally viable.  ( 2 min )
    On Regularized Sparse Logistic Regression. (arXiv:2309.05925v2 [cs.LG] UPDATED)
    Sparse logistic regression is for classification and feature selection simultaneously. Although many studies have been done to solve $\ell_1$-regularized logistic regression, there is no equivalently abundant work on solving sparse logistic regression with nonconvex regularization term. In this paper, we propose a unified framework to solve $\ell_1$-regularized logistic regression, which can be naturally extended to nonconvex regularization term, as long as certain requirement is satisfied. In addition, we also utilize a different line search criteria to guarantee monotone convergence for various regularization terms. Empirical experiments on binary classification tasks with real-world datasets demonstrate our proposed algorithms are capable of performing classification and feature selection effectively at a lower computational cost.  ( 2 min )
    When, Why and How Much? Adaptive Learning Rate Scheduling by Refinement. (arXiv:2310.07831v1 [cs.LG])
    Learning rate schedules used in practice bear little resemblance to those recommended by theory. We close much of this theory/practice gap, and as a consequence are able to derive new problem-adaptive learning rate schedules. Our key technical contribution is a refined analysis of learning rate schedules for a wide class of optimization algorithms (including SGD). In contrast to most prior works that study the convergence of the average iterate, we study the last iterate, which is what most people use in practice. When considering only worst-case analysis, our theory predicts that the best choice is the linear decay schedule: a popular choice in practice that sets the stepsize proportionally to $1 - t/T$, where $t$ is the current iteration and $T$ is the total number of steps. To go beyond this worst-case analysis, we use the observed gradient norms to derive schedules refined for any particular task. These refined schedules exhibit learning rate warm-up and rapid learning rate annealing near the end of training. Ours is the first systematic approach to automatically yield both of these properties. We perform the most comprehensive evaluation of learning rate schedules to date, evaluating across 10 diverse deep learning problems, a series of LLMs, and a suite of logistic regression problems. We validate that overall, the linear-decay schedule matches or outperforms all commonly used default schedules including cosine annealing, and that our schedule refinement method gives further improvements.  ( 3 min )
    Differentially Private Non-convex Learning for Multi-layer Neural Networks. (arXiv:2310.08425v1 [cs.LG])
    This paper focuses on the problem of Differentially Private Stochastic Optimization for (multi-layer) fully connected neural networks with a single output node. In the first part, we examine cases with no hidden nodes, specifically focusing on Generalized Linear Models (GLMs). We investigate the well-specific model where the random noise possesses a zero mean, and the link function is both bounded and Lipschitz continuous. We propose several algorithms and our analysis demonstrates the feasibility of achieving an excess population risk that remains invariant to the data dimension. We also delve into the scenario involving the ReLU link function, and our findings mirror those of the bounded link function. We conclude this section by contrasting well-specified and misspecified models, using ReLU regression as a representative example. In the second part of the paper, we extend our ideas to two-layer neural networks with sigmoid or ReLU activation functions in the well-specified model. In the third part, we study the theoretical guarantees of DP-SGD in Abadi et al. (2016) for fully connected multi-layer neural networks. By utilizing recent advances in Neural Tangent Kernel theory, we provide the first excess population risk when both the sample size and the width of the network are sufficiently large. Additionally, we discuss the role of some parameters in DP-SGD regarding their utility, both theoretically and empirically.  ( 2 min )
    Conditional Sig-Wasserstein GANs for Time Series Generation. (arXiv:2006.05421v2 [cs.LG] UPDATED)
    Generative adversarial networks (GANs) have been extremely successful in generating samples, from seemingly high dimensional probability measures. However, these methods struggle to capture the temporal dependence of joint probability distributions induced by time-series data. Furthermore, long time-series data streams hugely increase the dimension of the target space, which may render generative modelling infeasible. To overcome these challenges, motivated by the autoregressive models in econometric, we are interested in the conditional distribution of future time series given the past information. We propose the generic conditional Sig-WGAN framework by integrating Wasserstein-GANs (WGANs) with mathematically principled and efficient path feature extraction called the signature of a path. The signature of a path is a graded sequence of statistics that provides a universal description for a stream of data, and its expected value characterises the law of the time-series model. In particular, we develop the conditional Sig-$W_1$ metric, that captures the conditional joint law of time series models, and use it as a discriminator. The signature feature space enables the explicit representation of the proposed discriminators which alleviates the need for expensive training. We validate our method on both synthetic and empirical dataset and observe that our method consistently and significantly outperforms state-of-the-art benchmarks with respect to measures of similarity and predictive ability.  ( 3 min )
    An interpretable neural network-based non-proportional odds model for ordinal regression. (arXiv:2303.17823v3 [stat.ME] UPDATED)
    This study proposes an interpretable neural network-based non-proportional odds model (N$^3$POM) for ordinal regression. N$^3$POM is different from conventional approaches to ordinal regression with non-proportional models in several ways: (1) N$^3$POM is designed to directly handle continuous responses, whereas standard methods typically treat de facto ordered continuous variables as discrete, (2) instead of estimating response-dependent finite coefficients of linear models from discrete responses as is done in conventional approaches, we train a non-linear neural network to serve as a coefficient function. Thanks to the neural network, N$^3$POM offers flexibility while preserving the interpretability of conventional ordinal regression. We establish a sufficient condition under which the predicted conditional cumulative probability locally satisfies the monotonicity constraint over a user-specified region in the covariate space. Additionally, we provide a monotonicity-preserving stochastic (MPS) algorithm for effectively training the neural network. We apply N$^3$POM to several real-world datasets.  ( 2 min )
    Generalization bounds for neural ordinary differential equations and deep residual networks. (arXiv:2305.06648v2 [stat.ML] UPDATED)
    Neural ordinary differential equations (neural ODEs) are a popular family of continuous-depth deep learning models. In this work, we consider a large family of parameterized ODEs with continuous-in-time parameters, which include time-dependent neural ODEs. We derive a generalization bound for this class by a Lipschitz-based argument. By leveraging the analogy between neural ODEs and deep residual networks, our approach yields in particular a generalization bound for a class of deep residual networks. The bound involves the magnitude of the difference between successive weight matrices. We illustrate numerically how this quantity affects the generalization capability of neural networks.  ( 2 min )
    NECO: NEural Collapse Based Out-of-distribution detection. (arXiv:2310.06823v2 [stat.ML] UPDATED)
    Detecting out-of-distribution (OOD) data is a critical challenge in machine learning due to model overconfidence, often without awareness of their epistemological limits. We hypothesize that ``neural collapse'', a phenomenon affecting in-distribution data for models trained beyond loss convergence, also influences OOD data. To benefit from this interplay, we introduce NECO, a novel post-hoc method for OOD detection, which leverages the geometric properties of ``neural collapse'' and of principal component spaces to identify OOD data. Our extensive experiments demonstrate that NECO achieves state-of-the-art results on both small and large-scale OOD detection tasks while exhibiting strong generalization capabilities across different network architectures. Furthermore, we provide a theoretical explanation for the effectiveness of our method in OOD detection. We plan to release the code after the anonymity period.  ( 2 min )
    Variational Imbalanced Regression: Fair Uncertainty Quantification via Probabilistic Smoothing. (arXiv:2306.06599v4 [cs.LG] UPDATED)
    Existing regression models tend to fall short in both accuracy and uncertainty estimation when the label distribution is imbalanced. In this paper, we propose a probabilistic deep learning model, dubbed variational imbalanced regression (VIR), which not only performs well in imbalanced regression but naturally produces reasonable uncertainty estimation as a byproduct. Different from typical variational autoencoders assuming I.I.D. representations (a data point's representation is not directly affected by other data points), our VIR borrows data with similar regression labels to compute the latent representation's variational distribution; furthermore, different from deterministic regression models producing point estimates, VIR predicts the entire normal-inverse-gamma distributions and modulates the associated conjugate distributions to impose probabilistic reweighting on the imbalanced data, thereby providing better uncertainty estimation. Experiments in several real-world datasets show that our VIR can outperform state-of-the-art imbalanced regression models in terms of both accuracy and uncertainty estimation. Code will soon be available at \url{https://github.com/Wang-ML-Lab/variational-imbalanced-regression}.  ( 2 min )
    Hyperparameter Adaptive Search for Surrogate Optimization: A Self-Adjusting Approach. (arXiv:2310.07970v1 [cs.LG])
    Surrogate Optimization (SO) algorithms have shown promise for optimizing expensive black-box functions. However, their performance is heavily influenced by hyperparameters related to sampling and surrogate fitting, which poses a challenge to their widespread adoption. We investigate the impact of hyperparameters on various SO algorithms and propose a Hyperparameter Adaptive Search for SO (HASSO) approach. HASSO is not a hyperparameter tuning algorithm, but a generic self-adjusting SO algorithm that dynamically tunes its own hyperparameters while concurrently optimizing the primary objective function, without requiring additional evaluations. The aim is to improve the accessibility, effectiveness, and convergence speed of SO algorithms for practitioners. Our approach identifies and modifies the most influential hyperparameters specific to each problem and SO approach, reducing the need for manual tuning without significantly increasing the computational burden. Experimental results demonstrate the effectiveness of HASSO in enhancing the performance of various SO algorithms across different global optimization test problems.  ( 2 min )
    Robust 1-bit Compressed Sensing with Iterative Hard Thresholding. (arXiv:2310.08019v1 [cs.IT])
    In 1-bit compressed sensing, the aim is to estimate a $k$-sparse unit vector $x\in S^{n-1}$ within an $\epsilon$ error (in $\ell_2$) from minimal number of linear measurements that are quantized to just their signs, i.e., from measurements of the form $y = \mathrm{Sign}(\langle a, x\rangle).$ In this paper, we study a noisy version where a fraction of the measurements can be flipped, potentially by an adversary. In particular, we analyze the Binary Iterative Hard Thresholding (BIHT) algorithm, a proximal gradient descent on a properly defined loss function used for 1-bit compressed sensing, in this noisy setting. It is known from recent results that, with $\tilde{O}(\frac{k}{\epsilon})$ noiseless measurements, BIHT provides an estimate within $\epsilon$ error. This result is optimal and universal, meaning one set of measurements work for all sparse vectors. In this paper, we show that BIHT also provides better results than all known methods for the noisy setting. We show that when up to $\tau$-fraction of the sign measurements are incorrect (adversarial error), with the same number of measurements as before, BIHT agnostically provides an estimate of $x$ within an $\tilde{O}(\epsilon+\tau)$ error, maintaining the universality of measurements. This establishes stability of iterative hard thresholding in the presence of measurement error. To obtain the result, we use the restricted approximate invertibility of Gaussian matrices, as well as a tight analysis of the high-dimensional geometry of the adversarially corrupted measurements.  ( 3 min )
    Efficient probabilistic reconciliation of forecasts for real-valued and count time series. (arXiv:2210.02286v3 [stat.ML] UPDATED)
    Hierarchical time series are common in several applied fields. The forecasts for these time series are required to be coherent, that is, to satisfy the constraints given by the hierarchy. The most popular technique to enforce coherence is called reconciliation, which adjusts the base forecasts computed for each time series. However, recent works on probabilistic reconciliation present several limitations. In this paper, we propose a new approach based on conditioning to reconcile any type of forecast distribution. We then introduce a new algorithm, called Bottom-Up Importance Sampling, to efficiently sample from the reconciled distribution. It can be used for any base forecast distribution: discrete, continuous, or in the form of samples, providing a major speedup compared to the current methods. Experiments on several temporal hierarchies show a significant improvement over base probabilistic forecasts.  ( 2 min )
    Memorization with neural nets: going beyond the worst case. (arXiv:2310.00327v2 [stat.ML] UPDATED)
    In practice, deep neural networks are often able to easily interpolate their training data. To understand this phenomenon, many works have aimed to quantify the memorization capacity of a neural network architecture: the largest number of points such that the architecture can interpolate any placement of these points with any assignment of labels. For real-world data, however, one intuitively expects the presence of a benign structure so that interpolation already occurs at a smaller network size than suggested by memorization capacity. In this paper, we investigate interpolation by adopting an instance-specific viewpoint. We introduce a simple randomized algorithm that, given a fixed finite dataset with two classes, with high probability constructs an interpolating three-layer neural network in polynomial time. The required number of parameters is linked to geometric properties of the two classes and their mutual arrangement. As a result, we obtain guarantees that are independent of the number of samples and hence move beyond worst-case memorization capacity bounds. We illustrate the effectiveness of the algorithm in non-pathological situations with extensive numerical experiments and link the insights back to the theoretical results.  ( 2 min )
    Model-Agnostic Covariate-Assisted Inference on Partially Identified Causal Effects. (arXiv:2310.08115v1 [econ.EM])
    Many causal estimands are only partially identifiable since they depend on the unobservable joint distribution between potential outcomes. Stratification on pretreatment covariates can yield sharper partial identification bounds; however, unless the covariates are discrete with relatively small support, this approach typically requires consistent estimation of the conditional distributions of the potential outcomes given the covariates. Thus, existing approaches may fail under model misspecification or if consistency assumptions are violated. In this study, we propose a unified and model-agnostic inferential approach for a wide class of partially identified estimands, based on duality theory for optimal transport problems. In randomized experiments, our approach can wrap around any estimates of the conditional distributions and provide uniformly valid inference, even if the initial estimates are arbitrarily inaccurate. Also, our approach is doubly robust in observational studies. Notably, this property allows analysts to use the multiplier bootstrap to select covariates and models without sacrificing validity even if the true model is not included. Furthermore, if the conditional distributions are estimated at semiparametric rates, our approach matches the performance of an oracle with perfect knowledge of the outcome model. Finally, we propose an efficient computational framework, enabling implementation on many practical problems in causal inference.  ( 2 min )
    Generative modeling of time-dependent densities via optimal transport and projection pursuit. (arXiv:2304.09663v2 [stat.ML] UPDATED)
    Motivated by the computational difficulties incurred by popular deep learning algorithms for the generative modeling of temporal densities, we propose a cheap alternative which requires minimal hyperparameter tuning and scales favorably to high dimensional problems. In particular, we use a projection-based optimal transport solver [Meng et al., 2019] to join successive samples and subsequently use transport splines [Chewi et al., 2020] to interpolate the evolving density. When the sampling frequency is sufficiently high, the optimal maps are close to the identity and are thus computationally efficient to compute. Moreover, the training process is highly parallelizable as all optimal maps are independent and can thus be learned simultaneously. Finally, the approach is based solely on numerical linear algebra rather than minimizing a nonconvex objective function, allowing us to easily analyze and control the algorithm. We present several numerical experiments on both synthetic and real-world datasets to demonstrate the efficiency of our method. In particular, these experiments show that the proposed approach is highly competitive compared with state-of-the-art normalizing flows conditioned on time across a wide range of dimensionalities.  ( 3 min )
    Online RL in Linearly $q^\pi$-Realizable MDPs Is as Easy as in Linear MDPs If You Learn What to Ignore. (arXiv:2310.07811v1 [cs.LG])
    We consider online reinforcement learning (RL) in episodic Markov decision processes (MDPs) under the linear $q^\pi$-realizability assumption, where it is assumed that the action-values of all policies can be expressed as linear functions of state-action features. This class is known to be more general than linear MDPs, where the transition kernel and the reward function are assumed to be linear functions of the feature vectors. As our first contribution, we show that the difference between the two classes is the presence of states in linearly $q^\pi$-realizable MDPs where for any policy, all the actions have approximately equal values, and skipping over these states by following an arbitrarily fixed policy in those states transforms the problem to a linear MDP. Based on this observation, we derive a novel (computationally inefficient) learning algorithm for linearly $q^\pi$-realizable MDPs that simultaneously learns what states should be skipped over and runs another learning algorithm on the linear MDP hidden in the problem. The method returns an $\epsilon$-optimal policy after $\text{polylog}(H, d)/\epsilon^2$ interactions with the MDP, where $H$ is the time horizon and $d$ is the dimension of the feature vectors, giving the first polynomial-sample-complexity online RL algorithm for this setting. The results are proved for the misspecified case, where the sample complexity is shown to degrade gracefully with the misspecification error.  ( 3 min )
    Conformal inference for regression on Riemannian Manifolds. (arXiv:2310.08209v1 [stat.ML])
    Regression on manifolds, and, more broadly, statistics on manifolds, has garnered significant importance in recent years due to the vast number of applications for this type of data. Circular data is a classic example, but so is data in the space of covariance matrices, data on the Grassmannian manifold obtained as a result of principal component analysis, among many others. In this work we investigate prediction sets for regression scenarios when the response variable, denoted by $Y$, resides in a manifold, and the covariable, denoted by X, lies in Euclidean space. This extends the concepts delineated in [Lei and Wasserman, 2014] to this novel context. Aligning with traditional principles in conformal inference, these prediction sets are distribution-free, indicating that no specific assumptions are imposed on the joint distribution of $(X, Y)$, and they maintain a non-parametric character. We prove the asymptotic almost sure convergence of the empirical version of these regions on the manifold to their population counterparts. The efficiency of this method is shown through a comprehensive simulation study and an analysis involving real-world data.  ( 2 min )
    On Extreme Value Asymptotics of Projected Sample Covariances in High Dimensions with Applications in Finance and Convolutional Networks. (arXiv:2310.08150v1 [math.ST])
    Maximum-type statistics of certain functions of the sample covariance matrix of high-dimensional vector time series are studied to statistically confirm or reject the null hypothesis that a data set has been collected under normal conditions. The approach generalizes the case of the maximal deviation of the sample autocovariances function from its assumed values. Within a linear time series framework it is shown that Gumbel-type extreme value asymptotics holds true. As applications we discuss long-only mimimal-variance portfolio optimization and subportfolio analysis with respect to idiosyncratic risks, ETF index tracking by sparse tracking portfolios, convolutional deep learners for image analysis and the analysis of array-of-sensors data.  ( 2 min )
    Local Graph Clustering with Noisy Labels. (arXiv:2310.08031v1 [cs.LG])
    The growing interest in machine learning problems over graphs with additional node information such as texts, images, or labels has popularized methods that require the costly operation of processing the entire graph. Yet, little effort has been made to the development of fast local methods (i.e. without accessing the entire graph) that extract useful information from such data. To that end, we propose a study of local graph clustering using noisy node labels as a proxy for additional node information. In this setting, nodes receive initial binary labels based on cluster affiliation: 1 if they belong to the target cluster and 0 otherwise. Subsequently, a fraction of these labels is flipped. We investigate the benefits of incorporating noisy labels for local graph clustering. By constructing a weighted graph with such labels, we study the performance of graph diffusion-based local clustering method on both the original and the weighted graphs. From a theoretical perspective, we consider recovering an unknown target cluster with a single seed node in a random graph with independent noisy node labels. We provide sufficient conditions on the label noise under which, with high probability, using diffusion in the weighted graph yields a more accurate recovery of the target cluster. This approach proves more effective than using the given labels alone or using diffusion in the label-free original graph. Empirically, we show that reliable node labels can be obtained with just a few samples from an attributed graph. Moreover, utilizing these labels via diffusion in the weighted graph leads to significantly better local clustering performance across several real-world datasets, improving F1 scores by up to 13%.  ( 3 min )
    Evaluation of ChatGPT-Generated Medical Responses: A Systematic Review and Meta-Analysis. (arXiv:2310.08410v1 [stat.ME])
    Large language models such as ChatGPT are increasingly explored in medical domains. However, the absence of standard guidelines for performance evaluation has led to methodological inconsistencies. This study aims to summarize the available evidence on evaluating ChatGPT's performance in medicine and provide direction for future research. We searched ten medical literature databases on June 15, 2023, using the keyword "ChatGPT". A total of 3520 articles were identified, of which 60 were reviewed and summarized in this paper and 17 were included in the meta-analysis. The analysis showed that ChatGPT displayed an overall integrated accuracy of 56% (95% CI: 51%-60%, I2 = 87%) in addressing medical queries. However, the studies varied in question resource, question-asking process, and evaluation metrics. Moreover, many studies failed to report methodological details, including the version of ChatGPT and whether each question was used independently or repeatedly. Our findings revealed that although ChatGPT demonstrated considerable potential for application in healthcare, the heterogeneity of the studies and insufficient reporting may affect the reliability of these results. Further well-designed studies with comprehensive and transparent reporting are needed to evaluate ChatGPT's performance in medicine.  ( 2 min )
    Extensions of Heterogeneity in Integration and Prediction (HIP) with R Shiny Application. (arXiv:2310.08426v1 [stat.ME])
    Multiple data views measured on the same set of participants is becoming more common and has the potential to deepen our understanding of many complex diseases by analyzing these different views simultaneously. Equally important, many of these complex diseases show evidence of subgroup heterogeneity (e.g., by sex or race). HIP (Heterogeneity in Integration and Prediction) is among the first methods proposed to integrate multiple data views while also accounting for subgroup heterogeneity to identify common and subgroup-specific markers of a particular disease. However, HIP is applicable to continuous outcomes and requires programming expertise by the user. Here we propose extensions to HIP that accommodate multi-class, Poisson, and Zero-Inflated Poisson outcomes while retaining the benefits of HIP. Additionally, we introduce an R Shiny application, accessible on shinyapps.io at https://multi-viewlearn.shinyapps.io/HIP_ShinyApp/, that provides an interface with the Python implementation of HIP to allow more researchers to use the method anywhere and on any device. We applied HIP to identify genes and proteins common and specific to males and females that are associated with exacerbation frequency. Although some of the identified genes and proteins show evidence of a relationship with chronic obstructive pulmonary disease (COPD) in existing literature, others may be candidates for future research investigating their relationship with COPD. We demonstrate the use of the Shiny application with a publicly available data. An R-package for HIP would be made available at https://github.com/lasandrall/HIP.  ( 3 min )
    Lion Secretly Solves Constrained Optimization: As Lyapunov Predicts. (arXiv:2310.05898v2 [cs.LG] UPDATED)
    Lion (Evolved Sign Momentum), a new optimizer discovered through program search, has shown promising results in training large AI models. It performs comparably or favorably to AdamW but with greater memory efficiency. As we can expect from the results of a random search program, Lion incorporates elements from several existing algorithms, including signed momentum, decoupled weight decay, Polak, and Nesterov momentum, but does not fit into any existing category of theoretically grounded optimizers. Thus, even though Lion appears to perform well as a general-purpose optimizer for a wide range of tasks, its theoretical basis remains uncertain. This lack of theoretical clarity limits opportunities to further enhance and expand Lion's efficacy. This work aims to demystify Lion. Based on both continuous-time and discrete-time analysis, we demonstrate that Lion is a theoretically novel and principled approach for minimizing a general loss function $f(x)$ while enforcing a bound constraint $\|x\|_\infty \leq 1/\lambda$. Lion achieves this through the incorporation of decoupled weight decay, where $\lambda$ represents the weight decay coefficient. Our analysis is made possible by the development of a new Lyapunov function for the Lion updates. It applies to a broader family of Lion-$\kappa$ algorithms, where the $\text{sign}(\cdot)$ operator in Lion is replaced by the subgradient of a convex function $\kappa$, leading to the solution of a general composite optimization problem of $\min_x f(x) + \kappa^*(x)$. Our findings provide valuable insights into the dynamics of Lion and pave the way for further improvements and extensions of Lion-related algorithms.  ( 3 min )
    Transformers as Decision Makers: Provable In-Context Reinforcement Learning via Supervised Pretraining. (arXiv:2310.08566v1 [cs.LG])
    Large transformer models pretrained on offline reinforcement learning datasets have demonstrated remarkable in-context reinforcement learning (ICRL) capabilities, where they can make good decisions when prompted with interaction trajectories from unseen environments. However, when and how transformers can be trained to perform ICRL have not been theoretically well-understood. In particular, it is unclear which reinforcement-learning algorithms transformers can perform in context, and how distribution mismatch in offline training data affects the learned algorithms. This paper provides a theoretical framework that analyzes supervised pretraining for ICRL. This includes two recently proposed training methods -- algorithm distillation and decision-pretrained transformers. First, assuming model realizability, we prove the supervised-pretrained transformer will imitate the conditional expectation of the expert algorithm given the observed trajectory. The generalization error will scale with model capacity and a distribution divergence factor between the expert and offline algorithms. Second, we show transformers with ReLU attention can efficiently approximate near-optimal online reinforcement learning algorithms like LinUCB and Thompson sampling for stochastic linear bandits, and UCB-VI for tabular Markov decision processes. This provides the first quantitative analysis of the ICRL capabilities of transformers pretrained from offline trajectories.  ( 2 min )
    Variable Selection for Kernel Two-Sample Tests. (arXiv:2302.07415v3 [stat.ML] UPDATED)
    We consider the variable selection problem for two-sample tests, aiming to select the most informative variables to distinguish samples from two groups. To solve this problem, we propose a framework based on the kernel maximum mean discrepancy (MMD). Our approach seeks a group of variables with a pre-specified size that maximizes the variance-regularized MMD statistics. This formulation also corresponds to the minimization of asymptotic type-II error while controlling type-I error, as studied in the literature. We present mixed-integer programming formulations and develop exact and approximation algorithms with performance guarantees for different choices of kernel functions. Furthermore, we provide a statistical testing power analysis of our proposed framework. Experiment results on synthetic and real datasets demonstrate the superior performance of our approach.  ( 2 min )
    Lattice real-time simulations with learned optimal kernels. (arXiv:2310.08053v1 [hep-lat])
    We present a simulation strategy for the real-time dynamics of quantum fields, inspired by reinforcement learning. It builds on the complex Langevin approach, which it amends with system specific prior information, a necessary prerequisite to overcome this exceptionally severe sign problem. The optimization process underlying our machine learning approach is made possible by deploying inherently stable solvers of the complex Langevin stochastic process and a novel optimality criterion derived from insight into so-called boundary terms. This conceptual and technical progress allows us to both significantly extend the range of real-time simulations in 1+1d scalar field theory beyond the state-of-the-art and to avoid discretization artifacts that plagued previous real-time field theory simulations. Limitations of and promising future directions are discussed.  ( 2 min )
    On the Computational Complexity of Private High-dimensional Model Selection via the Exponential Mechanism. (arXiv:2310.07852v1 [stat.ML])
    We consider the problem of model selection in a high-dimensional sparse linear regression model under the differential privacy framework. In particular, we consider the problem of differentially private best subset selection and study its utility guarantee. We adopt the well-known exponential mechanism for selecting the best model, and under a certain margin condition, we establish its strong model recovery property. However, the exponential search space of the exponential mechanism poses a serious computational bottleneck. To overcome this challenge, we propose a Metropolis-Hastings algorithm for the sampling step and establish its polynomial mixing time to its stationary distribution in the problem parameters $n,p$, and $s$. Furthermore, we also establish approximate differential privacy for the final estimates of the Metropolis-Hastings random walk using its mixing property. Finally, we also perform some illustrative simulations that echo the theoretical findings of our main results.  ( 2 min )
    Learning Regularized Monotone Graphon Mean-Field Games. (arXiv:2310.08089v1 [cs.GT])
    This paper studies two fundamental problems in regularized Graphon Mean-Field Games (GMFGs). First, we establish the existence of a Nash Equilibrium (NE) of any $\lambda$-regularized GMFG (for $\lambda\geq 0$). This result relies on weaker conditions than those in previous works for analyzing both unregularized GMFGs ($\lambda=0$) and $\lambda$-regularized MFGs, which are special cases of GMFGs. Second, we propose provably efficient algorithms to learn the NE in weakly monotone GMFGs, motivated by Lasry and Lions [2007]. Previous literature either only analyzed continuous-time algorithms or required extra conditions to analyze discrete-time algorithms. In contrast, we design a discrete-time algorithm and derive its convergence rate solely under weakly monotone conditions. Furthermore, we develop and analyze the action-value function estimation procedure during the online learning process, which is absent from algorithms for monotone GMFGs. This serves as a sub-module in our optimization algorithm. The efficiency of the designed algorithm is corroborated by empirical evaluations.  ( 2 min )
    Efficient Integrators for Diffusion Generative Models. (arXiv:2310.07894v1 [cs.LG])
    Diffusion models suffer from slow sample generation at inference time. Therefore, developing a principled framework for fast deterministic/stochastic sampling for a broader class of diffusion models is a promising direction. We propose two complementary frameworks for accelerating sample generation in pre-trained models: Conjugate Integrators and Splitting Integrators. Conjugate integrators generalize DDIM, mapping the reverse diffusion dynamics to a more amenable space for sampling. In contrast, splitting-based integrators, commonly used in molecular dynamics, reduce the numerical simulation error by cleverly alternating between numerical updates involving the data and auxiliary variables. After extensively studying these methods empirically and theoretically, we present a hybrid method that leads to the best-reported performance for diffusion models in augmented spaces. Applied to Phase Space Langevin Diffusion [Pandey & Mandt, 2023] on CIFAR-10, our deterministic and stochastic samplers achieve FID scores of 2.11 and 2.36 in only 100 network function evaluations (NFE) as compared to 2.57 and 2.63 for the best-performing baselines, respectively. Our code and model checkpoints will be made publicly available at \url{https://github.com/mandt-lab/PSLD}.  ( 2 min )
    How Many Pretraining Tasks Are Needed for In-Context Learning of Linear Regression?. (arXiv:2310.08391v1 [stat.ML])
    Transformers pretrained on diverse tasks exhibit remarkable in-context learning (ICL) capabilities, enabling them to solve unseen tasks solely based on input contexts without adjusting model parameters. In this paper, we study ICL in one of its simplest setups: pretraining a linearly parameterized single-layer linear attention model for linear regression with a Gaussian prior. We establish a statistical task complexity bound for the attention model pretraining, showing that effective pretraining only requires a small number of independent tasks. Furthermore, we prove that the pretrained model closely matches the Bayes optimal algorithm, i.e., optimally tuned ridge regression, by achieving nearly Bayes optimal risk on unseen tasks under a fixed context length. These theoretical findings complement prior experimental research and shed light on the statistical foundations of ICL.  ( 2 min )
    Personalised dynamic super learning: an application in predicting hemodiafiltration's convection volumes. (arXiv:2310.08479v1 [stat.ME])
    Obtaining continuously updated predictions is a major challenge for personalised medicine. Leveraging combinations of parametric regressions and machine learning approaches, the personalised online super learner (POSL) can achieve such dynamic and personalised predictions. We adapt POSL to predict a repeated continuous outcome dynamically and propose a new way to validate such personalised or dynamic prediction models. We illustrate its performance by predicting the convection volume of patients undergoing hemodiafiltration. POSL outperformed its candidate learners with respect to median absolute error, calibration-in-the-large, discrimination, and net benefit. We finally discuss the choices and challenges underlying the use of POSL.  ( 2 min )
    L2P: Learning to Place for Estimating Heavy-Tailed Distributed Outcomes. (arXiv:1908.04628v3 [cs.LG] UPDATED)
    Many real-world prediction tasks have outcome variables that have characteristic heavy-tail distributions. Examples include copies of books sold, auction prices of art pieces, demand for commodities in warehouses, etc. By learning heavy-tailed distributions, "big and rare" instances (e.g., the best-sellers) will have accurate predictions. Most existing approaches are not dedicated to learning heavy-tailed distribution; thus, they heavily under-predict such instances. To tackle this problem, we introduce Learning to Place (L2P), which exploits the pairwise relationships between instances for learning. In its training phase, L2P learns a pairwise preference classifier: is instance A > instance B? In its placing phase, L2P obtains a prediction by placing the new instance among the known instances. Based on its placement, the new instance is then assigned a value for its outcome variable. Experiments on real data show that L2P outperforms competing approaches in terms of accuracy and ability to reproduce heavy-tailed outcome distribution. In addition, L2P provides an interpretable model by placing each predicted instance in relation to its comparable neighbors. Interpretable models are highly desirable when lives and treasure are at stake.  ( 3 min )
    Statistical Performance Guarantee for Selecting Those Predicted to Benefit Most from Treatment. (arXiv:2310.07973v1 [stat.ME])
    Across a wide array of disciplines, many researchers use machine learning (ML) algorithms to identify a subgroup of individuals, called exceptional responders, who are likely to be helped by a treatment the most. A common approach consists of two steps. One first estimates the conditional average treatment effect or its proxy using an ML algorithm. They then determine the cutoff of the resulting treatment prioritization score to select those predicted to benefit most from the treatment. Unfortunately, these estimated treatment prioritization scores are often biased and noisy. Furthermore, utilizing the same data to both choose a cutoff value and estimate the average treatment effect among the selected individuals suffer from a multiple testing problem. To address these challenges, we develop a uniform confidence band for experimentally evaluating the sorted average treatment effect (GATES) among the individuals whose treatment prioritization score is at least as high as any given quantile value, regardless of how the quantile is chosen. This provides a statistical guarantee that the GATES for the selected subgroup exceeds a certain threshold. The validity of the proposed methodology depends solely on randomization of treatment and random sampling of units without requiring modeling assumptions or resampling methods. This widens its applicability including a wide range of other causal quantities. A simulation study shows that the empirical coverage of the proposed uniform confidence bands is close to the nominal coverage when the sample is as small as 100. We analyze a clinical trial of late-stage prostate cancer and find a relatively large proportion of exceptional responders with a statistical performance guarantee.  ( 3 min )
    Towards a Unified Analysis of Kernel-based Methods Under Covariate Shift. (arXiv:2310.08237v1 [stat.ML])
    Covariate shift occurs prevalently in practice, where the input distributions of the source and target data are substantially different. Despite its practical importance in various learning problems, most of the existing methods only focus on some specific learning tasks and are not well validated theoretically and numerically. To tackle this problem, we propose a unified analysis of general nonparametric methods in a reproducing kernel Hilbert space (RKHS) under covariate shift. Our theoretical results are established for a general loss belonging to a rich loss function family, which includes many commonly used methods as special cases, such as mean regression, quantile regression, likelihood-based classification, and margin-based classification. Two types of covariate shift problems are the focus of this paper and the sharp convergence rates are established for a general loss function to provide a unified theoretical analysis, which concurs with the optimal results in literature where the squared loss is used. Extensive numerical studies on synthetic and real examples confirm our theoretical findings and further illustrate the effectiveness of our proposed method.  ( 2 min )
    Towards the Fundamental Limits of Knowledge Transfer over Finite Domains. (arXiv:2310.07838v1 [cs.LG])
    We characterize the statistical efficiency of knowledge transfer through $n$ samples from a teacher to a probabilistic student classifier with input space $\mathcal S$ over labels $\mathcal A$. We show that privileged information at three progressive levels accelerates the transfer. At the first level, only samples with hard labels are known, via which the maximum likelihood estimator attains the minimax rate $\sqrt{{|{\mathcal S}||{\mathcal A}|}/{n}}$. The second level has the teacher probabilities of sampled labels available in addition, which turns out to boost the convergence rate lower bound to ${{|{\mathcal S}||{\mathcal A}|}/{n}}$. However, under this second data acquisition protocol, minimizing a naive adaptation of the cross-entropy loss results in an asymptotically biased student. We overcome this limitation and achieve the fundamental limit by using a novel empirical variant of the squared error logit loss. The third level further equips the student with the soft labels (complete logits) on ${\mathcal A}$ given every sampled input, thereby provably enables the student to enjoy a rate ${|{\mathcal S}|}/{n}$ free of $|{\mathcal A}|$. We find any Kullback-Leibler divergence minimizer to be optimal in the last case. Numerical simulations distinguish the four learners and corroborate our theory.  ( 2 min )
  • Open

    [D] is there a good Code or Text model ?
    i am trying to detect code segments in a text response of an LLM, so i can highlight them using Highlight,JS, ​ is there a good model that can do the classification of a block of text and decide if it is a block of code or a block of NLP simple text (english) ? submitted by /u/Particular_Flower_12 [link] [comments]  ( 9 min )

  • Open

    Savage Dall-e 3 delivers "Average reddit post"
    submitted by /u/Zimmax [link] [comments]
    AI — weekly megathread!
    News provided by aibrews.com Researchers present LLark: A Multimodal Foundation Model for Music - an open-source instruction-tuned multimodal model for music understanding. LLark is trained entirely from open-source music data and models [Demo | Paper] Researchers released LLaVA-1.5. LLaVA (Large Language and Vision Assistant) is an open-source large multimodal model that combines a vision encoder and Vicuna for general-purpose visual and language understanding. LLaVA-1.5 achieved SoTA on 11 benchmarks, with just simple modifications to the original LLaVA and completed training in ~1 day on a single 8-A100 node [Demo | Paper | GitHub]. Voice AI platform ElevenLabs released AI Dubbing tool that enables users to automatically translate any audio in a video into a different language whil…
    The AI Boom Could Use a Shocking Amount of Electricity
    The rapid growth of artificial intelligence (AI) could lead to a significant increase in global electricity consumption, according to a peer-reviewed analysis published in Joule. The analysis estimates that if current trends continue, AI could drive the demand for electricity in data centers to consume at least 85.4 terawatt-hours annually, which is more than what many small countries use in a year. AI is energy-intensive, with both the training and inference phases requiring a significant amount of energy. The size of AI models, such as large language models, and the location of data centers also contribute to energy usage. Factors such as cooling requirements and the type of hardware used can impact energy consumption. Source : https://www.scientificamerican.com/article/the-ai-boom-could-use-a-shocking-amount-of-electricity/ submitted by /u/NuseAI [link] [comments]
    Lemur: Harmonizing Natural Language and Code for Language Agents
    Today's conversational bots like Claude and GPT can chat impressively but aren't great at complex planning or executing technical tasks. To overcome this, new research from HKU builds open-source AI agents that blend natural language and coding skills. They're called Lemur and Lemur-Chat. The researchers think achieving versatile real-world agents requires models that integrate both fluid natural language abilities and precise programming language control. Humans combine plain speech for higher-level goals with languages like Python when we need to plan intricately and execute exactly. AI needs both capacities too. But most existing models specialize in pure language or pure code. There's a separation that is limiting. The team created Lemur by pretraining the open-source Llama-2 on a massive mixed corpus with 10x more natural language than code. This improved its programming abilities while retaining conversational strength. Further instruction tuning optimized Lemur-Chat for following free-form directions in language. Experiments found Lemur surpassed specialized coding-only models like Codex in overall benchmarks. Lemur-Chat then exceeded Lemur by 15% after instruction tuning. More importantly, Lemur-Chat won 12/13 new "agent tests" designed to mimic real-world challenges needing both language and programming prowess. It beat alternatives at: Using tools like Python and Wikipedia to enhance reasoning Debugging code by leveraging error messages Improving the most from natural language feedback Exploring partially observable environments like cybersecurity and web browsing simulations. Lemur-Chat matched GPT-3.5 in many tests, closing the gap between commercial and open-source agents. TLDR: New open-source AI agents combine coding and language skills. Experiments show the combo unlocks more performance across technical challenges. Full summary is here. Paper is here. submitted by /u/Successful-Western27 [link] [comments]
    Henry Kissinger: The Path to AI Arms Control
    submitted by /u/ForeignAffairsMag [link] [comments]
    A 21-year-old won $40,000 for using AI to read the first word on a 2,000-year-old papyrus scroll buried by Mount Vesuvius
    submitted by /u/thisisinsider [link] [comments]
    "Special Announcement: John Carmack & Rich Sutton partner to accelerate development of AGI" | "Carmack and Sutton are deeply focused on developing a genuine AI prototype by 2030, including establishing, advancing, and documenting AGI signs of life"
    submitted by /u/Tao_Dragon [link] [comments]
    Dumbing down or wising up: how will generative AI change the way we think?
    submitted by /u/Jariiari7 [link] [comments]
    One-Minute Daily AI News 10/13/2023
    In a recent article published in the journal Nature, researchers developed AI Tool EVEscape, a tool to forecast which severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) strains have the highest potential to escape host immunity.[1] Microsoft seems to be working on the possible development of an artificial intelligence (AI) system that can understand and resolve customer support requests using natural language processing.[2] Google’s Search Generative Experience (SGE) will let you create images right from a text prompt starting Thursday.[3] The Biden administration is considering closing a loophole that gives Chinese companies access to American artificial intelligence (AI) chips through units located overseas, according to four people familiar with the matter.[4] Sources: [1] https://www.news-medical.net/news/20231012/EVScape-New-tool-to-forecast-which-SARS-CoV-2-variants-could-dodge-our-immunity.aspx [2] https://winbuzzer.com/2023/10/11/microsoft-gears-up-for-a-revolutionary-natural-language-customer-support-ai-xcxwbn/ [3] https://www.theverge.com/2023/10/12/23913337/google-ai-powered-search-sge-images-written-drafts [4] https://www.reuters.com/technology/biden-eyes-adding-ai-chip-curbs-chinese-companies-abroad-2023-10-13/ submitted by /u/Excellent-Target-847 [link] [comments]
    I’ve created a audiobook generator anyone got any books to test on it? Each character is given a different voice.
    Also if anyone has anyone who should be a voice actor included in it it can also clone voices. Idk I need to make sure it works for a wide variety of books. As long as they don’t use ‘ for quotes cause the computer getts that confused when “ I’ve “ and such uses the same symbol submitted by /u/Impossible_Belt_7757 [link] [comments]
    Check out the latest episode of my history podcast on the future of A.I.!
    submitted by /u/ErikSlader713 [link] [comments]
    Drew a picture in paint, threw it in hotpot, and it came out a stylish, halloweenish picure. Damn this stuff is amazing.
    submitted by /u/kipaxbooks [link] [comments]
  • Open

    [P] App for iOS and M1 macOS for image bounding box annotation
    ClassifyML is an application for creating specialised image datasets for use with an ML training algorithm. Simply import your chosen images into the app via file manager, drag'n'drop or the on device camera and create your bounding boxes and then export your images and JSON into a structured folder. LINK: https://apps.apple.com/app/classify-ml/id6461013113 https://preview.redd.it/dicsq9d3k1ub1.png?width=313&format=png&auto=webp&s=7976a61f599c658d948dec12db0b8ec93274ad93 https://preview.redd.it/3tswxdd3k1ub1.png?width=313&format=png&auto=webp&s=56ca30546984402f4dbba628b73732918e921758 https://preview.redd.it/y0xelmz3k1ub1.png?width=313&format=png&auto=webp&s=a755ea61bc247c6aacb61a31c700e4e80a1ed69f submitted by /u/LiamRogers99 [link] [comments]  ( 9 min )
    [D] What are the best resources for learning reinforcement learning?
    Recently I came across Open AI's Spinning Up Project, which seems to be well structured, but quite introductory. What are some resources you use for learning RL? submitted by /u/OwnAd9305 [link] [comments]  ( 9 min )
    [D] LLM for entity/scene recognition in a book?
    Hello, I'm looking for an open source LLM that can extract all the characters from an inputted book, and isolate passages with descriptive writing that involves imagery. Can anyone suggest me something? Thanks! submitted by /u/slomorosh [link] [comments]  ( 9 min )
    [P] Deploy and Run LLMs at the Edge: Use Code Llama to Generate a Dashboard in a Network Restricted Environment
    In this blog, we explore different definitions of “the edge,” and understand the factors driving AI/ML to the edge. We examine why the trends of LLMs and edge computing are intersecting now, and how teams can take advantage of their combined power today. We also demonstrate how LLMs can be used in an edge environment to generate insights for a real-world use case today. Consider a geologist working in a remote oil field who is responsible for building and analyzing 3D models of oil fields to determine production capacity and the impact on profitability. In this demo, we walk through how Code Llama, Chassisml.io, and Modzy could be used to build a dashboard that geologists could use to analyze well data in real-time in a remote, network restricted environment, allowing for LLM insights generated at the edge. Learn more: https://www.modzy.com/modzy-blog/deploy-and-run-llms-at-the-edge submitted by /u/modzykirsten [link] [comments]  ( 9 min )
    [D] ICLR submissions are out. Discussion thread
    https://openreview.net/group?id=ICLR.cc/2024/Conference submitted by /u/_puhsu [link] [comments]  ( 8 min )
    [D] Vscode issue
    I am running AutoTokenizer from transformers on vscode. The vscode crashes showing error and not responding. I don't understand what's wrong. submitted by /u/ArtichokeOne5897 [link] [comments]  ( 8 min )
    "[P]" Utilizing Machine Learning Techniques for Document Digitalization Project
    Hey Guys, ​ I am currently spearheading a project for a client in the insurance industry, with a primary objective being the digitalization of thousands of hardcopy contracts. The ultimate goal is to automatically extract particular information from these newly digital documents, namely "date", "insurance premium", "insurance type", and "contractor's name". However, I anticipate a level of variability in terms of exact terminology used, particularly with regards to "insurance premium" and "insurance type". (There is no handwritten text) ​ I am keen on sharing the methodology I intend to apply for this project and invite your invaluable feedback and suggestions: ​ - Firstly, I'll execute the scanning/digitalization of the documents manually. - Post this, I plan to utilize Tesseract in combination with Python for the extraction of text from the preprocessed images. - I am considering using libraries such as NLTK or spaCy to preprocess this text (this will involve steps like lower casing, removing punctuations, etc.) - Finally, I plan to train a custom model for Named Entity Recognition (NER), to accommodate the potential semantic variations in entity labeling which are specific to entities like "insurance premium" and "insurance type". ​ I would be immensely grateful if I could gain your insights on the above-proposed pipeline - Are there any glaring pitfalls I need to avoid or perhaps some improvements that I could incorporate? Your expert advice can certainly help ensure the success of this venture. ​ Many thanks in anticipation for your time and valuable inputs! submitted by /u/Background_Thanks604 [link] [comments]  ( 9 min )
    [News] AI & ML conference in San Francisco [Special discount code for this subreddit]
    I work for this database company SingleStore and we are hosting a AI & ML conference in San Francisco on 17th of October, 2023. It is an in-person conference with amazing speakers line-up like Harrison Chase, co-founder and CEO of LangChain and many more. We will have hands-on workshops, swags giveaway and much more. I don't know if it makes sense to share this but I believe it might help some of you near San Francisco to go and meet the industry leaders and network with other data engineering folks. Use my discount coupon code 'PAVAN100OFF' to avail 100% off on the ticket price. (the original ticket price is $199) Get your tickets now! submitted by /u/PavanBelagatti [link] [comments]  ( 9 min )
    Using RAG on CoreML version of Llama2 [P]
    Has anyone ever attempted this or finetuning before on the CoreML version? I’m currently trying to and I’m not even sure where to start tbh. CoreML version of Llama 2: https://huggingface.co/coreml-projects/Llama-2-7b-chat-coreml submitted by /u/Inside-Aromatic [link] [comments]  ( 9 min )
    [D] How does L1 Regularization able to drive a coefficient to zero?
    Hi all, I’m studying the concepts of machine learning. However, I am stuck because I still don’t see how introducing a penalty using lasso regression can drive some parameter coefficients to zero. When doing the calculations, I only get the final value (ordinary least squares + penalty) and don’t directly see a coefficient value being reduced. I've looked at many materials and resources trying to explain this, but I still can't see how it's done. I think the important thing for me is seeing it going to zero or, at the very least, seeing it during calculation. Is there anyone that can help explain this better? Or, If you know of a formula that I can derive that, during the derivation process, shows a coefficient being reduced or set to zero, that would also help. Also, any good resources on the topic would be appreciated. Edit: This post should have been posted in r/learnmachinelearning here is a link to the same post in that subreddit submitted by /u/thismymind [link] [comments]  ( 9 min )
    [D] How do you pre-pay OpenaAI compute credit with university funds ?
    I am an academic and I have some funding. However, I cannot just plug in my lab card with a recurrent payment, procedures don't allow it. Is there a way to "top up" some compute credits on the OpenAI accounts ? Is anyone having the same problem ? Thanks. submitted by /u/Jean-Porte [link] [comments]  ( 9 min )
    [R] Seeking Guidance on Efficiently Classifying and Cleansing Automotive Data with Python
    Hi, we are working on a project that involves dealing with messy automotive data, and are looking for guidance on possible approaches and tools. We aim to map messy supplier data of car makes/models to standardized values from our approved list. This requires handling various challenges like typos, varied specificity, and sometimes research-based mapping (e.g., using engine size and production year to ascertain a chassis code). eg: If a supplier provides 'BNW 316i saloon 1990-1994', (typo intentional) we would like to match it to our standardized value of 'BMW 3 Series (E36)'. Our old approach has been a combination of utilizing fuzzy matching for typos/basic matching and time consuming manual processing and verification. We have recently experimented with using GPT for providing guess…  ( 10 min )
    [R] Lemur: Harmonizing Natural Language and Code for Language Agents
    Today's conversational bots like Claude and GPT can chat impressively but aren't great at complex planning or executing technical tasks. To overcome this, new research from HKU builds open-source AI agents that blend natural language and coding skills. They're called Lemur and Lemur-Chat. The researchers think achieving versatile real-world agents requires models that integrate both fluid natural language abilities and precise programming language control. Humans combine plain speech for higher-level goals with languages like Python when we need to plan intricately and execute exactly. AI needs both capacities too. But most existing models specialize in pure language or pure code. There's a separation that is limiting. The team created Lemur by pretraining the open-source Llama-2 on a massive mixed corpus with 10x more natural language than code. This improved its programming abilities while retaining conversational strength. Further instruction tuning optimized Lemur-Chat for following free-form directions in language. Experiments found Lemur surpassed specialized coding-only models like Codex in overall benchmarks. Lemur-Chat then exceeded Lemur by 15% after instruction tuning. More importantly, Lemur-Chat won 12/13 new "agent tests" designed to mimic real-world challenges needing both language and programming prowess. It beat alternatives at: Using tools like Python and Wikipedia to enhance reasoning Debugging code by leveraging error messages Improving the most from natural language feedback Exploring partially observable environments like cybersecurity and web browsing simulations. Lemur-Chat matched GPT-3.5 in many tests, closing the gap between commercial and open-source agents. TLDR: New open-source AI agents combine coding and language skills. Experiments show the combo unlocks more performance across technical challenges. Full summary is here. Paper is here. submitted by /u/Successful-Western27 [link] [comments]  ( 9 min )
    [P] Introducing PPO and Rainbow DQN to our super fast evolutionary HPO reinforcement learning framework
    Hi, we've just released a new version of AgileRL, our evolutionary hyperparameter optimisation framework built for RL that is 10x faster than SOTA. We've introduced PPO, Rainbow DQN, some sophisticated replay buffers, and also collaborated with the Farama Foundation to create some tutorials (more on the way). Please check it out and take it for a spin. We're also looking for contributors so get in touch if you would like to be involved! https://github.com/AgileRL/AgileRL submitted by /u/nicku_a [link] [comments]  ( 9 min )
    [P] Free open-source ML observability course: starts October 16 🚀
    Hi everyone, I’m one of the creators of Evidently, an open-source (Apache 2.0) tool for production ML monitoring. We’ve just launched a free open course on ML observability that I wanted to share with the community. The course covers: 📚 Key concepts of ML monitoring and observability (data drift, data and model quality metrics, etc.) 🔡 Monitoring unstructured data (embeddings, texts, LLMs, etc.) 🛠 Different deployment architectures (batch ML monitoring jobs, near real-time ML monitoring, etc.) The course is free and open. All materials are public, with no sign-up required. You’ll work with open-source tools like Evidently, MLflow, Airflow, and Grafana. We’ve already published the first 12 videos with notes and code examples. We’ll add new lessons and deployment blueprints over the following weeks. The official course start date is October 16, 2023. You can also learn at your own pace. Course info and notes: https://learn.evidentlyai.com/ [Background] We’ve been working on Evidently since late 2020 and have spoken to 100s of data scientists, ML engineers, and ML platform teams in different industries. In this course, we tried to sum up answers to the frequent questions on the topic. It starts with high-level theoretical modules and goes to complete deployment blueprints. It is approachable for different levels of knowledge, and you can pick only the modules you are interested in. Looking forward to meeting you at the course! submitted by /u/mllena [link] [comments]  ( 9 min )
    Can I use ArcPro to do machine learning on point (numeric) data? [D] [R]
    I am trying to do machine learning in ArcPro, and I want to understand the relationship between x, y, numeric variable 1, numeric variable 2, and one nominal variable (classified; i.e. can be one of four values). I'd like to be able to predict numeric variable 1 based on everything else. Can ArcPro accommodate machine learning for anything other than raster type data. That is, can it be used to do machine learning on point (numeric) data? Thanks! submitted by /u/arcgis_123 [link] [comments]  ( 9 min )
    [R] TimeGPT : The first Generative Pretrained Transformer for Time-Series Forecasting
    In 2023, Transformers made significant breakthroughs in time-series forecasting For example, earlier this year, Zalando proved that scaling laws apply in time-series as well. Providing you have large datasets ( And yes, 100,000 time series of M4 are not enough - smallest 7B Llama was trained on 1 trillion tokens! ) Nixtla curated a 100B dataset of time-series and trained TimeGPT, the first foundation model on time-series. The results are unlike anything we have seen so far. I published the results in my latest article. I hope the research will be insightful for people who work on time-series projects. Link: https://aihorizonforecast.substack.com/p/timegpt-the-first-foundation-model Note: If you know any other good resources on very large benchmarks for time series models, feel free to add them below. ​ submitted by /u/nkafr [link] [comments]
    [R] Pointers to (deep) latent variable models that admit analytical approximations
    Hi everyone. I am aware that there is a plethora of deep generative models out there (e.g. variational autoencoders (VAE), GANs) that can model high-dimensional data as the images of latent variables under a non-linear mapping (typically neural network). In more traditional methods such as probabilistic PCA, the latent variables can be marginalised analytically. In Bayesian PCA (BPCA), we can additionally integrate out the linear mapping, from the latent space to the observation space, by adopting the variational lower bound that leads to closed form updates of the parameters. The Gaussian Process Latent Variable (GPLVM) model adopts a non-linear probabilistic mapping (a Gaussian process) that can be marginalised. These two models enjoy to a certain degree analytical solutions concerning the inference of the latent variables and the mapping. I have been wondering whether there is any research into more "complex" models (perhaps I should call them deep) that are capable of modelling more complex data distributions than the GPVLM and BPCA, but retain analytical solutions when inferring the posterior of the latent variables (like BPCA) or the mapping (like GPLVM)? What I like about the GPLVM and BPCA is that they possess an objective function (i.e. ELBO) that can be analytically optimised, as opposed to the intractable objective of VAEs that necessitates Monte-Carlo averages and stochastic gradient. Could somebody please point me to such examples of more complex generative models that admit analytical inference for working out the posterior of the latent variables or the mapping? ----- This has also been posted on stack exchange: https://ai.stackexchange.com/q/42418/61537 submitted by /u/ngiann [link] [comments]  ( 9 min )
    [D] I love teaching! But I don't have enough publication for it, what should I do?
    Do I love teaching? Oh, absolutely, YES a big YES! My time as a TA for countless semesters has been amazing. Staying after hours, spending long evenings and early mornings, to make each of my students find ease in debugging both easy-peasy and mind-boggling programs – it’s been a joy, truly. Watching those fresh faces, whom I introduced to Python in their first year ( intro to programming lab), now immerse themselves into my computer vision labs, exploring computer vision and deep learning in their third/forth year – it’s incredibly rewarding! And yeah my students kind of like me! after each semester I get tons of emails thanking me and my TAship review is always good. But, ugh, do I have enough publications to become faculty? A big fat NO! My efforts have been relentless, and everyone in my department would nod in agreement. But luck and reviewers? Not my best pals, apparently. So yeah, I don’t have a stack of 8 top-tier papers. I’ve managed to scrape together 3, and a few second tiers. My citation count is not that bad somewhere between 200 and 300-ish. Now, what’s next for me? Dive into the industry? become a high school teacher? Or perhaps, do a postdoc journey, fingers crossed for a sprinkle more luck and few more papers? Edit: This doesn't mean I don't like research, I actually love it too, I have done quite a few internship in quite big companies, most of the time they extend my intership and I even got publication out of one in 5 month. But I just like to teach a lot! strangely I got social anxiety every where other than my classrooms/labs. submitted by /u/LongjumpingSchool646 [link] [comments]  ( 9 min )
    [D] You don't need a Vector Database you just need a database
    I'm seeing some architectures come out from the LLM world that probably wouldn't survive the trip to production. If you choose a vector database how will you handle your other database needs? Then you'll need 2 databases. https://bionic-gpt.com/blog/you-dont-need-a-vector-database/ submitted by /u/purton_i [link] [comments]  ( 9 min )
    [D] Why back-propagation is intractable of MoCO key encoder?
    In the original paper of MoCo, it said that: Using a queue can make the dictionary large, but it also makes it intractable to update the key encoder by back-propagation (the gradient should propagate to all samples in the queue). First I thought that the main reason that the bp cannot imply on key encoder is that the queue operation is not differentable. But It seems not true. You can compute the gradient of all samples in the queue, then bp should be performed properly. See the code at the bottom. So WHAT IS THE REAL REASON THAT THE BP IS INTRACTABLE FOR KEY ENCODER? In my opinion, I think may be because of the large size of the queue (dictionary) which makes the memory explosive. python q = nn.Linear(768,128) k = nn.Linear(768,128) bs = 64 ks = 4095 model = nn.ModuleList([q,k]) x = torch.randn(bs, 768) optim = torch.optim.SGD(model.parameters(),lr=0.01) loss = nn.CrossEntropyLoss() def forward(x): xq = q(x) xk = k(x + 0.1) que = torch.rand(ks,128) pos = torch.einsum("nc,nc->n",xq,xk) neg = torch.einsum("nc,kc->nk",xq,que) out = torch.cat([pos.unsqueeze(-1),neg],dim=1) t = torch.zeros(out.shape[0],dtype=torch.long) l = loss(out,t) return l loss = forward(x) loss.backward() optim.step() submitted by /u/whishtLF [link] [comments]  ( 9 min )
    [D] Advisor rejects every idea I propose.
    A senior phd student at a moderately famous university. I have a reasonable number of accepted papers as first author in tier-1 conferences. I was thinking of going into academia, so recently I started proposing many ideas to my advisor so that I can mentor some junior students. However my advisor is rejecting every idea I suggest saying it won’t work. I’m feeling very dejected and I feel like I should give up going into academia. I don’t know what I’m expecting from here. Is your advisor like this too? submitted by /u/mildlyphd [link] [comments]  ( 9 min )
  • Open

    Batch calibration: Rethinking calibration for in-context learning and prompt engineering
    Posted by Han Zhou, Student Researcher, and Subhrajit Roy, Senior Research Scientist, Google Research Prompting large language models (LLMs) has become an efficient learning paradigm for adapting LLMs to a new task by conditioning on human-designed instructions. The remarkable in-context learning (ICL) ability of LLMs also leads to efficient few-shot learners that can generalize from few-shot input-label pairs. However, the predictions of LLMs are highly sensitive and even biased to the choice of templates, label spaces (such as yes/no, true/false, correct/incorrect), and demonstration examples, resulting in unexpected performance degradation and barriers for pursuing robust LLM applications. To address this problem, calibration methods have been developed to mitigate the effects of t…  ( 93 min )
  • Open

    Significance of AI in the development of software products
    Artificial Intelligence (AI) is emerging as a formidable force, revolutionizing how we conceive, create, and deliver software solutions. As technology advances at an unprecedented pace, the role of AI in this domain has become increasingly significant. It’s no longer just a buzzword; it’s a fundamental tool that promises to reshape the entire software development process.… Read More »Significance of AI in the development of software products The post Significance of AI in the development of software products appeared first on Data Science Central.  ( 19 min )
    Future of AI and data science – How to secure a bright career
    Companies, more often, pay attention to automation and innovation over proficiency and productivity. However, firms can maintain a balance between both due to the extensive usage of AI and data science programs. Here are the stats that show the impact of AI and data science in diverse sectors: Applications of AI and data science have… Read More »Future of AI and data science – How to secure a bright career The post Future of AI and data science – How to secure a bright career appeared first on Data Science Central.  ( 21 min )
  • Open

    From Skylines to Streetscapes: How SHoP Architects Brings Innovative Designs to Life
    At SHoP Architects, a New York City-based architectural firm, Mengyi Fan and her team aim to inspire industry professionals to create visual masterpieces by incorporating emerging technologies. Fan, the director of visualization at SHoP, has expertise that spans the fields of architectural visualization and design. She takes a definitive, novel and enduring approach to designing Read article >  ( 6 min )
  • Open

    Introducing PPO and Rainbow DQN to our super fast evolutionary HPO reinforcement learning framework
    Hi, we've just released a new version of AgileRL, our evolutionary hyperparameter optimisation framework built for RL that is 10x faster than SOTA. We've introduced PPO, Rainbow DQN, some sophisticated replay buffers, and also collaborated with the Farama Foundation to create some tutorials (more on the way). Please check it out and take it for a spin. We're also looking for contributors so get in touch if you would like to be involved! https://github.com/AgileRL/AgileRL submitted by /u/nicku_a [link] [comments]
    Masking state transitions in policy updates for invalid actions?
    I am currently dealing with an environment, that most of the time (90% of all state transitions) clips the action selected from the agent. Sometimes even down to the point where the action selected by the agent is completly ignored. This causes a lot of problems, because for example the entropy bonus does not works, since the agent learns to select any action, when it doesn't matter anyway but selects the same action (low entropy) when the actions have an effect. Using the PPO algorithm I was thinking of masking the state transitions in the policy updates, according to how much the action was clipped in the environment. And I thought V(s) should be masked, because it can still learn from the state transitions even if the action was effectively ignored by the environment. submitted by /u/flxh13 [link] [comments]
    A question about deterministic action selection at evaluation time
    I'm training some agents using fairly vanilla PPO on a hand-made environment. These agents learn to perform the task pretty well, but while I was examining their action probabilities during an evaluation episode, I had the idea to turn off deterministic action selection. To my surprise, allowing probabilistic action selection (as opposed to argmax action selection) actually improved performance in some cases. I had always thought that deterministic actions during evaluation was fairly standard, but now am thinking that maybe I missed something and that there are cases where you wouldn't want determinism? My question is: how common is it actually to use deterministic actions vs. probabilistic ones at evaluation time, and does anyone know of studies/papers/examples where the authors found probabilistic evaluation to outperform determinism? submitted by /u/Impallion [link] [comments]
    "A Simple Open-Loop Baseline for Reinforcement Learning Locomotion Tasks" Raffin et al. 2023
    submitted by /u/atooo57 [link] [comments]
    Looking for some advice regarding universal multi-head outputs
    Hey, So I am working on reinforcement learning package in C# (currently under heavy development): https://github.com/asieradzk/RL_Matrix My goal is to create something superior to unity's ML Agents for Godot to democratize access to reinforcement learning for people (without having them know what a tensor is) So far I've added some barebones DQN and PPO that (only output single discrete action) as proof of concept to test my code architecture. So I am going through the daunting task of having some universal workflow for setting up environments. For any shape observations and any count actions, both discrete and continuous. As I am finishing my multi-head multi-action output I've come to realise that there are many possible architectures I could setup multi head outputs, for instanc…
    Next state in turn based game
    To my knowledge, when using the Q Learning family algorithm, we must know the next state as well as the action spaces in couple with that observation in order to evaluate the reward for the next state with the target network. But I have some problem when trying to define this next state in turn turn-based game in which the agent have to make a certain number of actions and then wait for the opponent to do some actions before it can interact with the environment again. We can take Hearthstone as an example that each player have to wait for other to play a number of cards before can take any action. Currently, I have two options for this: - Treat the environment right after the agent's turn ended, which will lack the action space. - Treat the environment just before the agent's turn begins, which will have all the actions available that it can choose from but this will make the agent's last action very noisy. That state could be a good state if the opponent playing badly or they are very good and make our last decision seem like a very bad choice. Thanks in advance for any suggestions. If my problem is a common task that others have already solved many times, I will be very thankful for that keyword. submitted by /u/No-Concentrate-6037 [link] [comments]
    "Small batch deep reinforcement learning", Obando-Ceron et al 2023 {DM} (value-based agents explore & regularize better with small n)
    submitted by /u/gwern [link] [comments]
  • Open

    Neural Networks project
    Hi ! My group (4 people) has chosen to make an application that translates ancient stone inscriptions to modern languages as our university project . We can use external libraries to process images that we are going to translate but as we understood we have to build the neural network ourselves from scratch. My questions are 1) is this possible to do within 10 months? 2) if so how would you approach it ? submitted by /u/sakith123 [link] [comments]

  • Open

    How are memories stored in neural networks? | The Hopfield Network #SoME2
    submitted by /u/keghn [link] [comments]
    A question
    How does the neural network process input that were same but shown different to the network model? submitted by /u/Sith_vader3 [link] [comments]
    I don't much about NN's. is this correct ?
    i gave chatgpt vision an illustration of neural network from The Principles of Deep Learning Theory. what to know how correct its reponse is here is the response: https://preview.redd.it/inqe5xukxptb1.png?width=453&format=png&auto=webp&s=6e1079baeae8235b0e03a677e4006d1077af36a8 submitted by /u/YeshwanthRam [link] [comments]
  • Open

    Who Will Benefit from AI?
    Artificial intelligence (AI) can provide "machine usefulness" for human workers, augmenting their jobs rather than replacing them. However, there is a concern that AI could lead to job displacement and reinforce economic inequality. MIT economist Daron Acemoglu emphasizes the importance of making AI more useful to humans and ensuring that the economic benefits are shared widely. He suggests that innovations that augment workers' tasks can lead to prosperity for the workforce. Acemoglu also highlights the need for worker power and the careful implementation of technology to achieve shared prosperity and productivity gains. Source : https://idss.mit.edu/news/who-will-benefit-from-ai/ submitted by /u/NuseAI [link] [comments]
    What's the most advanced free chatbot available?
    I just need three things for it: It must be knowledgeable about things, such as physics, math, hystory, books, geography, etc. It also must be original, with a high level of SEO and AI detection score. It must be available in Italy. The last part is essential. Claude 2 is very famous but with sms verification from usa (which I don't have and I don't want to give credit card info/pay to have) it's made almost impossible even with vpn. submitted by /u/luigirovatti1 [link] [comments]
    10 Powerful ChatGPT Hacks for SEO
    submitted by /u/Senior_tasteey [link] [comments]
    ChatGPT's Global Peace Plan
    Creating true, enduring, lasting peace on Earth is an ambitious and complex endeavor that requires multifaceted approaches. Here’s a bold, outside-the-box plan that may surprise you: Step 1: Establish a Global Consciousness: Educational Overhaul: Revamp global educational systems to foster empathy, understanding, and appreciation for diverse cultures, religions, and viewpoints from a young age. Step 2: Eradicate Poverty and Inequality: Universal Basic Assets (UBA): Implement a Universal Basic Assets program, where every person on Earth is granted a share of global resources. Step 3: Create a Single Global Governance Entity: World Federation: Establish a democratically elected World Federation that respects regional autonomy but has overriding authority on global issues like…
    When your AI says she loves you
    submitted by /u/thisisinsider [link] [comments]
    Anyone ever thought about training a video generating model, but backwards?
    Just had a random idea: What if you train a video generating AI, but feed it videos that are reversed? You could show it an image of a crashed car, and it would generate a video of the crash. Show it a broken vase, it would "repair" it. It could one day become like the "reconstruct crime scene" in Detroit: Become Human. What are your thoughts about this? submitted by /u/FluffyIllustrator805 [link] [comments]
    AI and science: what 1,600 researchers think
    A Nature survey of over 1,600 researchers reveals that AI tools are becoming increasingly common in science and are expected to be 'very important' or 'essential' in the next decade. Scientists express concerns about how AI is transforming research, including reliance on pattern recognition without understanding, bias in data, fraud, and irreproducible research. The survey shows that AI tools provide faster ways to process data, speed up computations, and save time and money. Among researchers who use AI, more than one-quarter believe AI tools will become 'essential' to their field in the next decade. Large language models like ChatGPT are mentioned as both impressive and concerning examples of AI tools in science. Source : https://www.nature.com/articles/d41586-023-02980-0 submitted by /u/NuseAI [link] [comments]
    Looking for AI text input like Artbreeder Mixer that combines images
    I'm looking for a (free) ai image generator like Artbreeder Mixer, that has functions that allow you to "morph" or mix images together via text prompts. Ive looked at a bunch already, and even tried adding the text of the different types in the prompts, bu Ive been getting separated results (like "cat" , "man", "head" wont combine the man and the cat, but rather give me un-morphed results, like a regular man, plus a cat in a suit with no human features. I even get a result with a man standing behind a cat! Ive tried StarryAI, imagecreator, wepik, cant afford midjourney or paid ones right now, some others I cant remember with no mixing... Artbreeder's interface, you can keep adding and it will mix them together. I made these images and others like them very easy in Artbreeder, but its plan is very limited - I could buy more credits, but I need to wait a few days (new job, not paid yet, broke today... lol): ​ morph between man and donkey Morph between angry rapper and gorilla SO, if anyone can suggest some free, or almost free (generous newbie credits?) that can do mixes like this - please point me in the right direction. submitted by /u/magusat999 [link] [comments]
    New York wants to be AI's world capital, in rivalry with San Francisco and Silicon Valley
    submitted by /u/norcalnatv [link] [comments]
    Could an AI-created profile picture help you get a job?
    Artificial intelligence (AI) is being used to create professional-looking profile pictures for job hunting websites like LinkedIn. Apps like Remini, Try It On AI, and AI Suit Up use AI-based software to generate slick profile photos that mimic the work of expert photographers. Users upload multiple selfies, and the AI software creates artificial photos with different hairstyles, clothing, and backdrops. While some find the results realistic, others think they look artificial. The AI services are popular because they are cheap or free, making them accessible to those who can't afford professional headshots. However, opinions are divided on whether AI-generated photos are beneficial or detrimental to self-esteem. Some believe that AI-generated photos allow individuals to put their best self forward and potentially increase their chances of being considered for opportunities. Others worry that relying on AI-generated photos may negatively impact self-worth and confidence. Recruiters generally do not consider whether a photo is AI-generated when evaluating job applications. Source : https://www.bbc.co.uk/news/business-67054382 submitted by /u/NuseAI [link] [comments]
    AI Tool for film footage notes
    Hi, im currently filming a documentary, but I’m so busy filming, i don’t have time to write notes on footage for the editor. Does anyone know of any ai tool that can help with this and save time and streamline this process? King regards submitted by /u/Brand0n_C [link] [comments]
    How AI will affect traditional and open source software industry?
    Hey folks, how would you guys see the effect of AI? Will the small softwares companies will go bankrupt? Since the lots of software are using tools like ChatGpt, Midway Journey etc. It just the starting of new AI technology era which will evolved over the years. In that time we will see more and more AI software which will likely provide efficient and better solution as compare to traditional and open source software. So my question is how do you guys see this? Will small software companies or open source software programs days are number? submitted by /u/Haziq12345 [link] [comments]
    One-Minute Daily AI News 10/11/2023
    Opera has launched Opera One — a new version of the browser that comes packaged with an AI-powered chatbot called Aria.[1] Adobe is going all in on AI, announcing three new generative AI models today that add powerful features to Illustrator and Adobe Express and vastly improve Photoshop’s text-to-image capabilities.[2] ‘South Park’ to Tackle AI for Next Event Special, Releases Teaser.[3] World’s first AI tutor launched in Australia to help students get through their exams.[4] Sources: [1] https://www.theverge.com/2023/6/21/23768888/opera-one-browser-aria-ai-assistant-chatbot [2] https://www.theverge.com/2023/10/10/23911114/adobe-max-firefly-generative-ai-model-photoshop-illustrator-express [3] https://www.hollywoodreporter.com/tv/tv-news/south-park-ai-joining-panderverse-1235615276/ [4] https://www.techguide.com.au/news/computers-news/worlds-first-ai-tutor-launched-in-australia-to-help-students-get-through-their-exams/ submitted by /u/Excellent-Target-847 [link] [comments]
    Cypher 2023: The Future of Simulation and Design is AI
    submitted by /u/Agitated-Spell3979 [link] [comments]
    Any ideas how this was created?
    submitted by /u/crispyTacoTrain [link] [comments]
    Web design tools
    I’m looking for input and advice on tools for web designers. I use Wordpress a lot, Magento some and frequently code by hand in html JavaScript and PHP. I know there are some AI tools out there now but I don’t know which are best and wanted to find out what people thoughts are on this subject. What tools are you using, for what, and why? Thanks! submitted by /u/PowerTarget [link] [comments]
  • Open

    [R] Researchers Identify Emergent Linear Structures in How LLMs Represent Truth
    LLMs' tendency to make up false statements (hallucinate) is a major concern. We need ways to inspect whether they really "know" something is true or not so we can reduce hallucinations. In a new paper, researchers found that LLMs contain an internal "truth vector" - an emergent linear structure that represents factual truth values. They had the insight to visualize how GPT represents simple true/false sentences. The true ones clustered together, while false ones clustered elsewhere - suggesting some kind of 'truth direction' in its learned representations. To test this, they trained linear "probes" on one dataset, and found they could generalize to accurately detect truth values in totally different datasets about other topics. They also directly modified the models to add or subtract the identified truth vectors from its processing of statements. This could flip assessments of truth value, showing the vector causally influences reasoning. Together, these findings provide evidence that neural networks can create emergent, linear structures that represent factual truth. This finding could eventually help make AI systems less prone to hallucinations and falsehoods. TLDR: LLMs can create emergent linear representations of truth. This sheds light on how AI represents abstract concepts and could help us reduce hallucinations. Full summary. Paper is here. submitted by /u/Successful-Western27 [link] [comments]  ( 9 min )
    [D] Recommendations request for a guide to research publication
    I am working on a research topic in Data Engineering. Forgive me if this is a question frequently asked, I couldn't find this specifically in the FAQ. What are good publication tips and journals to publish in? I read through a few journals and all of them are big publications. What if I opt fot some upcoming or other niche (maybe data engineering) journals submitted by /u/Sherbhy [link] [comments]  ( 9 min )
    [R] SWE-bench: Can Language Models Resolve Real-world GitHub issues?
    We have a new benchmark out called SWE-bench (arxiv) It challenges LMs to solve real GitHub issues (feature requests & bug reports) from popular Python repos. Answers are validated using unit tests we crawled from those repos. The benchmark at swebench.com/ shows that even the strongest models, such as Claude 2 and GPT-4, get less than 5% accuracy. ​ We are here to answer any questions you may have. submitted by /u/ofirpress [link] [comments]  ( 9 min )
    [D] Sample probability diffusion models
    I would like to understand how I can calculate the probability that a sample belongs to the distribution a diffusion model was trained on. Say, I have an image of a car, and I would like to know whether this image belongs to the distribution that is estimated by the diffusion model. So I would like to know the probability between zero and one at the car belongs to this distribution Do you know how I technically can do this? submitted by /u/That_Phone6702 [link] [comments]  ( 9 min )
    [Discussion] Making a Tutorial for Using a New Platform for ML in the climate and earth science space
    Hey guys Looking for some ideas. I'm building out a jupyter book that will be a tutorial on how to use a research platform for data analysis and modelling. My PI has given me free liberty over it. I can not think of a good idea to do the analysis and build the model on. It does not need to be complex but should be good enough so that any researcher, student or organization using the platform can get a good idea of how to use it for ML. Any thoughts on a good area to look into? Any recommendations? Note this will be a tutorial and as such an overly complex model is unnecessary. I just can not figure out what to look into so hoping you guys could give thoughts about possible areas in climate, weather and earth science that I could focus on for the tutorial in the jupyter book. submitted by /u/AdditionalFun3 [link] [comments]  ( 9 min )
    [D] Submitting a paper rejected by EMNLP to ARR
    First time submitting to ARR here. I was quite confused about this paper resubmission thing. I got rejected by EMNLP (submission directly to EMNLP with openreview) a week ago and I am planning to resubmit it to the ARR system (also using openreview). Does this EMNLP submission count as a previous ARR submission that should be mentioned or not? Do I need to withdraw the paper from EMNLP openreview prior to submitting it to ARR openreview? submitted by /u/Icy-Distribution6887 [link] [comments]  ( 9 min )
    [D] [P] UI-based AI agents: UI-Act
    Hi! Happy to share a project I've been working on for a while: UI-Act https://github.com/TobiasNorlund/UI-Act It's an AI model architecture designed to autonomously navigate and interact with computers using the graphical user interface. Think of it as a co-pilot that "sees" your screen and acts on it, just as a human would. In essence, it's a custom transformer model taking prompt and screenshots as input, with output heads to predict low-level actions i.e. mouse clicks. In the demo, it has been trained to compute simple expressions in a calculator window, using expert demonstrations/behavior cloning. If scaled up appropriately however, it could provide a basis for a general agent to automate arbitrary tasks on a computer. I would be interested in hearing your thoughts on it, and especially with regards to the trend towards general AI agents and assistants (Windows Copilot / Adept ACT-1 / AutoGPT etc). LMs equipped with e.g. function-calling is a trendy approach, that rely on text-based state representations and APIs to take action. In cases where this is unfeasible, UI-based agents might provide a more general alternative. As the agent's interface to the computer is shared with humans, it can be easily taught using expert demonstrations, and require little or no technical expertice. Let me know what you think! submitted by /u/tobibbelfuel [link] [comments]  ( 9 min )
    [P] Learn how to make trustworthy and transparent machine learning models in Tsetlin Machine Book Chapter 7: Confidence, Trustworthiness, and Composites.
    ​ Confidence and trustworthiness of Tsetlin Machines. Hi all! Just completed a new chapter in the book An Introduction to Tsetlin Machines: https://tsetlinmachine.org Happy to receive feedback! Abstract: Collaboration can be essential to manage complex projects. One example is building a house. You then need the expertise of carpenters, plumbers, and electricians. Each profession brings unique skills to the table. Similarly, different types of Tsetlin machines can have distinct capabilities. In this chapter, you learn how Tsetlin machines can team up, allowing them to achieve more than they could on their own. The effectiveness of a team relies on recognising each member's strengths and limitations. Appreciating where your expertise stops and where your coworkers' expertise begins is crucial for effective collaboration. We first explore how Tsetlin machines can assess their competence in Section 7.1. Using the vote count from Chapter 1, you learn to measure how confident a Tsetlin machine is when it makes its decisions. It is possible to be highly confident and still perform poorly. To be trustworthy, confidence must be in line with one's capabilities. Therefore, Section 7.1 also covers how to evaluate trustworthiness. Next, in Section 7.2, you discover how to build a team of Tsetlin machines with different skills. By assessing each Tsetlin machine's confidence, you can lean on the confident ones when making decisions. The result is a Tsetlin machine composite - a construction where multiple Tsetlin machines join forces. You can think of it as a composite material, such as epoxy, which reinforces resin with fibres, making it strong, lightweight, and durable. submitted by /u/olegranmo [link] [comments]  ( 9 min )
    [R] [D] Need Peer Review: Unsupervised Learning for Student Dropout Anomaly Detection
    Hello all, Just wrapped up Task 1.1 for anomaly detection in student dropout rates. Keen for some extra eyes on it. Task Highlights: Data Pre-processing & Normalisation K-Means Clustering Gaussian Anomaly Detector Used PCA for dimensionality reduction Links to the following files: data.csv Task 1.1 - Rubric.pdf Task1.1Script.ipynb https://drive.google.com/drive/folders/17XcjEoYCrDWqf90VVNdkLAkYNdtWWwGu?usp=sharing Would greatly appreciate any feedback! Cheers! submitted by /u/Nook31 [link] [comments]
    [R] A method to assess trustworthiness of machine coding at scale
    submitted by /u/mnky9800n [link] [comments]  ( 8 min )
    [P] [vilays] Prototype Video Demo - Any Feedback from ML Engineers?
    Hi everyone, I’m thrilled to share a prototype we've been tirelessly working on. We are developing a virtualization environment for applications, specifically tailored to engineers, designers, data scientists, and researchers. In a nutshell, our platform enables users to run cloud-hosted desktop apps from any device, making it appear as if the applications are installed on their local machines, while they're actually operating on a remote server. The ultimate goal is to obliterate barriers between local and cloud execution, especially for compute-intensive workloads, thereby allowing seamless usage of High-Performance Computing software on the cloud with the scalability to adjust computing resources as per necessity. We’re here to solicit your invaluable feedback on our product video demo. Your insights will not only help us identify any blind spots and enhance our solution but also better understand the needs and preferences of our potential user base. 📽 [https://youtu.be/QR8FWRnPrXM?feature=shared] We're eagerly awaiting your thoughts and appreciate you taking the time to help us refine our product! Thank you! :) submitted by /u/aaron-cesaro [link] [comments]
    [D] Databricks Dolly 15k - Creating Synthetic Variants
    Hey all, I found Dolly to be a very interesting project when it was released but I'm curious if it has similar value today because a lot of synthetic data generation options seem to be popping up. Now it seems like Dolly is human generated/curated by over 5k employees (which is great), but wouldn't it be a better approach now to have Llama70b (or maybe Falcon) just generate future variants of 15k rows? I havent been able to figure out why we arent seeing more synthetic datasets like this on HF? Is the bottleneck licensing, compute or just incentive? Heres the original Dolly post thread: https://www.reddit.com/r/MachineLearning/comments/120usfk/r_hello_dolly_democratizing_the_magic_of_chatgpt/ submitted by /u/buzzyness [link] [comments]
    [D] Please suggest a Loss function for image to image task.
    What is the loss function that needs to be used for a task that takes an input image with a lot of haze and produces an image with reduced haze. The architecture is a simple encoder decoder architecture. I tried MSE as some articles and ML guides say that MSE is good for pixel wise comparison and also tried Categorical Crossentropy but none of them work so great. MSE works but produces artefacts like red/green/ blue spots and spatters and at worse times it produces a white image. The research on this task includes use of SIDNet[Single Image Dehazing Net], Transmission maps, Dark channel prior algorithm, FFA net, etc trained on the Benchmark datasets (RESIDE,SOTS). I aim to create a simple architecture for college project so I chose the Enc-Dec architecture. Any suggestions are appreciated. submitted by /u/Wild_Basil_2396 [link] [comments]
    [D] Startup team demonstrates differentiable Swift compiler outrunning TensorFlow by 322X
    Autonomous systems startup, PassiveLogic, assembled a differentiable computing team, to build a fast systems language with native performance differentiability. Their latest benchmark trains networks two orders of magnitude faster than PyTorch and Tensorflow. See: LinkedIn Post&dashCommentUrn=urn%3Ali%3Afsd_comment%3A(7118052434916110337%2Curn%3Ali%3Aactivity%3A7117911978106355712)) It's a collaborative effort with the Swift community and Apple's compiler team, using the Swift language as a strongly typed embedded language that performs ahead of time compilation of graph neural nets. The focus is on fusing systems programming and AI engineering into a single native high performance language, to enable typed heterogeneous inference and training. The compiler development is open sourced as part of the standard Swift package. Try it yourself at swift.org. submitted by /u/taharvey [link] [comments]  ( 9 min )
    [D] How is test-driven development implemented in the context of machine learning?
    I recently tried to refactor a previous project that I had, but I realized that after making all of the changes the performance wasn't reproducible anymore. I decided to start from scratch, make incremental changes, and make sure that the model's performance is maintained with each change. Very basic in hindsight, but I guess I was too hasty with coding. Anyway, running the full model's training and evaluation with each change is proving to take too long. I'm curious if there's any other way that people implement TDD in the context of machine learning since projects/applications tend to be more time consuming then typical applications. submitted by /u/Seankala [link] [comments]
  • Open

    Developing industrial use cases for physical simulation on future error-corrected quantum computers
    Posted by Nicholas Rubin, Senior Research Scientist, and Ryan Babbush, Head of Quantum Algorithms, Quantum AI Team If you’ve paid attention to the quantum computing space, you’ve heard the claim that in the future, quantum computers will solve certain problems exponentially more efficiently than classical computers can. They have the potential to transform many industries, from pharmaceuticals to energy. For the most part, these claims have rested on arguments about the asymptotic scaling of algorithms as the problem size approaches infinity, but this tells us very little about the practical performance of quantum computers for finite-sized problems. We want to be more concrete: Exactly which problems are quantum computers more suited to tackle than their classical counterparts, an…  ( 94 min )
  • Open

    UK Tech Festival Showcases Startups Using AI for Creative Industries
    At one of the U.K.’s largest technology festivals, top enterprises and startups are this week highlighting their latest innovations, hosting workshops and celebrating the growing tech ecosystem based in the country’s southwest. The Bristol Technology Festival today showcased the work of nine startups that recently participated in a challenge hosted by Digital Catapult — the Read article >  ( 6 min )
    Get in Gear: ‘Forza Motorsport’ Races Onto GeForce NOW
    Put the pedal to the metal this GFN Thursday as Forza Motorsport leads 23 new games in the cloud. Plus, Acer’s Predator Connect 6E is the newest addition to the GeForce NOW Recommended program, with easy cloud gaming quality-of-service (QoS) settings built in to give Ultimate members the best streaming experience. No Breaks, No Limits, Read article >  ( 6 min )
  • Open

    DeepMind 2022 'full accounts' financial report: 2022 budget: £1,081 million ($1.3b) (decreased by a fifth from 2021)
    submitted by /u/gwern [link] [comments]
    RL for non-Python environments?
    Most real world applications for RL (robotics, game dev, finance) are in not normally done in Python, yet all major RL frameworks are written in Python. Is there a good/high-performance cross-language framework to do RL in other languages like C++/.Net/Java? If not, do you think people would be interested in such a framework? ​ submitted by /u/xor24 [link] [comments]
    Reinforcement learning agents that adhere to a causal model of the problem
    Do you know any work that tries to develop RL agents that exploit some sort of high-level model of the problem (it could also be given by an expert human) to learn faster or operate on out-of-distribution scenarios? I'm particularly interested in Causal Models, but any similar thing could be interesting for me submitted by /u/fedetask [link] [comments]
    What is the intuitive explanation for using log probabilities in Policy gradient methods instead of simple probabilities? does it improve gradient descent optimization ?
    submitted by /u/aabra__ka__daabra [link] [comments]
    Why does Drq-v2 sample from replay by episode then experience?
    I've been looking at DrQ-v2 (https://github.com/facebookresearch/drqv2) recently and it samples from replay in a way that seems odd to me but may have a purpose I don't understand. They store experiences in a compressed file by episode, this makes some sense since it means they don't have to store everything in RAM and they delay disk writes until the end of the episode so they don't slow down the sim operation. On sampling, they randomly select an episode then randomly select an experience from the episode, calculating the n-step reward dynamically at sample time instead of at experience storage time. This is then fed to the model by a pytorch DataLoader. This means a _lot_ of disk reads during the optimization step which can't be ideal but I'll put that aside. What is the advantage of doing this selection by episode? It may give a better spread across episodes in each update, but I'm not sure that makes up for the potential downsides of making prioritization and other replay tricks much harder. Any ideas? submitted by /u/EDMismyO2 [link] [comments]
    Can reinforcement learning models learn to rank?
    I have a very simple observation: a list of random value state = [random.uniform(-0.2, 0.2) for _ in range(200)] reward = state * actions . The reward is not using the next state, it's using the previous state i gave to the model. So basically i already give the answer to the model, the best action is : if state > 0 action =1, if state < 0 action = -1 I tried using PPO, but it seem not learning at all. My test_env.py is here: ``` import gymnasium as gym import numpy as np from gymnasium import spaces from gymnasium.utils import seeding from stable_baselines3.common.vec_env import DummyVecEnv import random class TestEnv(gym.Env): metadata = {"render.modes": ["human"]} def __init__( self, item_count, test_steps, is_train = True, ): self.is_train = is_train self.test_steps = test_step…
  • Open

    Microsoft at VL/HCC 2023: Focus on co-audit tools for spreadsheets
    These research papers were presented at the IEEE Symposium on Visual Languages and Human-Centric Computing (opens in new tab) (VL/HCC 2023), a premier forum for design, theory, and application of computing technologies for programming, modelling, and communication. Large language models (LLMs) have revolutionized the way novice programmers and everyday computer users tap into the capabilities […] The post Microsoft at VL/HCC 2023: Focus on co-audit tools for spreadsheets appeared first on Microsoft Research.  ( 10 min )
  • Open

    Homework problems are rigged
    This post is a follow-on to a discussion that started on Twitter yesterday. This tweet must have resonated with a lot of people because it’s had over 230,000 views so far. You almost have to study advanced math to solve basic math problems. Sometimes a high school student can solve a real world problem that […] Homework problems are rigged first appeared on John D. Cook.  ( 7 min )
  • Open

    12 Generative AI Trends to Watch Out for
    The advent of generative AI is empowering everyone alike – organizations, small businesses, individuals, students, and medical professionals, to name a few. The last couple of years have been revolutionary for artificial intelligence innovation and transformation. How will 2024 shape up for AI, AI tools, and related professionals? Let’s analyze the trends that are most… Read More »12 Generative AI Trends to Watch Out for The post 12 Generative AI Trends to Watch Out for appeared first on Data Science Central.  ( 20 min )

  • Open

    Predictive AI analyzing attraction to facial features (iris Dating app)
    Top dating apps Tinder, Hinge and Bumble have all stated that they're already investing in AI to make their apps better. They're using it to verify profiles, match people based on bios and interests, and help generate profile descriptions and liven conversations. But what about machine learning on user photos? iris Dating uses AI to analyze user input in the form of liking or disliking faces ("swiping" profiles). We all know if we like blondes or brunettes, blue or brown eyes, short or long hair, beard or no beard, etc. But AI can pick up the subtlest features (proportions, distances, curvatures etc.) and build a face map. A matrix of features, if you will. It doesn't just look for a person looking like your favorite celebrity crush. It understands what you're really attracted to. From there it's an easy path: if it knows which features attract me, it can predict my level of attraction to a specific individual (specifically, their face). Find the persons with the highest predicted attractiveness (for me, not for everyone), rank them by attraction for me, and we have a potential high mutual attraction match. The two stats I have are that on average women like 55%(!) of the profiles iris picks for them; and that users have 40x higher chances of matching when they've trained the model to understand their taste. I know it takes a lot more than a pretty face to make for a great relationship, but it sure doesn't hurt to start with strong physical attraction. Missed connections on Craigslist are about just that: seeing a face you can't forget. Find me more of these "wow" faces and let's go from there. What do you think? Is it too early? Too bold? Too niche? submitted by /u/akahamlet [link] [comments]
    Superman if portrayed by different actors (as imagined by AI)
    submitted by /u/fat_n_stupid [link] [comments]
    DALL·E 3 is blocking copyrighted material. Also DALL·E 3:
    submitted by /u/Zimmax [link] [comments]
    The AI research job market shit show
    The AI research job market is going through a shakeup, with a high demand for skilled researchers and a scarcity of talent. Companies closely monitor the movements of researchers as an indicator of their ability to transition from concept to product. The market is highly competitive, with researchers being offered high salaries and compensation packages. This has led to high turnover and attrition in many companies, causing unsettledness among employees. Despite the challenges, the investment in AI research is expected to drive innovation and push the boundaries of the Transformer architecture. Source : https://www.interconnects.ai/p/ai-research-job-market submitted by /u/NuseAI [link] [comments]
    Are there any low res (pixel art) art tools?
    I'm looking for ways to create art for a game I'm creating. submitted by /u/Yenii_3025 [link] [comments]
    Inverting Transformers Significantly Improves Time Series Forecasting
    Transformers are great at NLP and computer vision tasks, but I was surprised to learn they still lag behind simple linear models at time series forecasting. The issue is how most Transformer architectures treat each timestamp as a token and fuse all the variable data from that moment. This makes two big problems: Variables recorded at slightly different times get blurred together, losing important timing info Each token can only see a single moment, no long-term dependencies So Transformers struggle to extract useful patterns and correlations from the data. Some researchers from Tsinghua University took a fresh look at this and realized the Transformer components themselves are solid, they just need to flip the architecture for time series data. Their "Inverted Transformer" (or iTransformer): Makes each variable's full history into a token, instead of each timestamp Uses self-attention over variables to capture relationships Processes time dependencies per variable with feedforward layers This simple tweak gives all the benefits we want: State-of-the-art forecasting accuracy, beating both linear models and standard Transformers Better generalization to unseen variables Increased interpretability Ability to leverage longer historical context TLDR: Inverting Transformers to align with time series structure allows them to outperform alternatives in working with time series data. Full summary. Paper is here. submitted by /u/Successful-Western27 [link] [comments]
    Best ChatGPT Plugins: Ultimate List for 2023
    submitted by /u/Senior_tasteey [link] [comments]
    The NSFW dream (truely unrestricted ai desires)
    I guess I'm looking for the impossible but does anyone know of a generator that has all of the following qualities in order of importance least to most important: Has a massive variety of styles like Womba's private discord server does. "Create variants" function like how a Womba discord personal server generator allows you to do. Generates beautiful "digital art" style images like the digital https://www.unstability.ai/ does. (Man those images are pretty) faces are really good most of the time. (It's frusterating as it looks so good but I can't seem to get any group sex poses going on.) Provides a variety of poses such as https://easywithai.com/ai-image-generators/promptchan-ai/ which also allows you to upload you own images for poses, like how I could upload a real life orgy image and as long as it could distinguish the bodies as being separate (not a big pile of limbs) it does pretty good, but lacks severely lacks in facial quality. Like a big booty girl in hyperreal style 1080P or higher resolution. (Again Womba is good here, but they are just extreme on their restrictions.) 1080P should be the minimum for any paid service as how can we truely enjoy a full screne image on anything less without it pixeling out? Doesn't cost $150/month (yes I found one that does all this but their premium subscription cost like $150/month (seduced.ai) and it's not even unlimited. I paid $90 for a full year at Womba discord unlimited but again, $150/month is just not worth it. If anyone knows of a server that has all these for around $25/month or less, please let me know. If really appreciate it. submitted by /u/russader [link] [comments]
    Can AI reference both photos to make the black and white photo the same as the colour image?
    I have a high resolution black and white print and a generic quality colour image of the same photo, that I'd like AI to look at both images and make the B&W into colour. Is this possible? submitted by /u/NikonD3X1985 [link] [comments]
    AI Morality Scenarios.
    submitted by /u/Philipp [link] [comments]
    One-Minute Daily AI News 10/10/2023
    Cybersecurity firm Avast is calling out a long-lived tool “LoveGPT,” that has haunted popular dating apps and that has been upgraded with artificial intelligence, gaining the ability to build fake profiles and manipulate unsuspecting users.[1] The outsider told the WSJ that Microsoft used AI from its partner OpenAI, which was then used to launch GitHub Copilot at $10 per month, but lost $20 per user in the average six months on average in the first 2023. Some Copilot users cost as much as $80 per month.[2] SK Telecom said on Monday that it successfully wrapped up its international AI competition of 226 teams, “Prompter Day Seoul 2023,” held in partnership with OpenAI.[3] Google DeepMind Researchers Introduce Promptbreeder: A Self-Referential and Self-Improving AI System that can Automatically Evolve Effective Domain-Specific Prompts in a Given Domain.[4] Sources: [1] https://decrypt.co/200787/lovegpt-ai-dating-apps-catfishing-hack-avast [2] https://game-news24.com/2023/10/10/microsoft-lost-20-for-every-10-copilot-ai-subscription-report-45-for-every-10-copilot-ai/ [3] https://asianews.network/skt-openai-hold-ai-competition-for-social-good/ [4] https://www.marktechpost.com/2023/10/08/google-deepmind-researchers-introduce-promptbreeder-a-self-referential-and-self-improving-ai-system-that-can-automatically-evolve-effective-domain-specific-prompts-in-a-given-domain/ submitted by /u/Excellent-Target-847 [link] [comments]
    I finally have enough ai tools and here is my complete list
    VIDEO EDITING InVideo CapCut Filmora Veed io Rotor KEYWORD RESEARCH VidiQ Summarized YT Summary CONTENT CREATION Explore Al Vidds Opus Descript Lumen5 Steve Al AUDIENCE ENGAGEMENT ManyChat TubeBuddy Canva Hootsuite ANALYTICS Vidyo Nova Al Daily Life Tools Taskade TLVD Bardeen Al Vondy Al Notion Al Chatbots Tools YatterPlus Typewise Quickchat Cohere Kaizan Coding Tools Durable Al 10Web Akkia Replit Deepcode Design Tools Flair Al Autodraw StockIMG Booth Al Clipdrop Content Creation Tools Writesonic Beautiful Al Tome Al ChatABC Steve Al Music Tools Boomy Amper Jukedeck Melodrive BrainFM Writing Tools AISEO Quillbot Writesonic Bertha Al Simplified Youtube Tools Eightify Thumbly Steve Al ClipMaker TubeBuddy Twitter Tools Tweetmonk Tribescaler Postwise Tweetlify Tweethunter Sales Tools Lavender Warmer Regie Twain Octane Marketing Tools simplified ContentEdge Copt Smith Copy Al Mutiny Research Tools Consensus Paperpal Trinka Writesonic scholarcy I'm just sharing my experiences and observations in the field of ai. LIST AND SITE submitted by /u/PerceptionPlayful469 [link] [comments]
    Write Your Next Book with These Awesome ChatGPT Prompts
    Awesome ChatGPT Prompts submitted by /u/Senior_tasteey [link] [comments]
  • Open

    [D] how to download datasets from huggingface
    Hello, first time using Google Colab and huggingface datasets. Colab notebook is easy to setup but I can't seem to figure out how to download datasets from huggingface. I am trying to download https://huggingface.co/datasets/kili-technology/plastic_in_river dataset in Colab Notebook. After reading some beginners forums, I modified the example to look like one below but it failed. from datasets import load_dataset data_files = {"train": "train.csv", "test": "test.csv", "validation": "validation.csv"} dataset = load_dataset("kili-technology/plastic_in_river", data_files=data_files) Because there's no path to the files to be downloaded. Can someone explain how to download datasets from huggingface please? Downloading builder script: 100% 3.25k/3.25k [00:00 in () 2 3 data_files = {"train": "train.csv", "test": "test.csv", "validation": "validation.csv"} ----> 4 dataset = load_dataset("kili-technology/plastic_in_river", data_files=data_files) 5 frames /usr/local/lib/python3.10/dist-packages/datasets/data_files.py in resolve_pattern(pattern, base_path, allowed_extensions, download_config) 366 if allowed_extensions is not None: 367 error_msg += f" with any supported extension {list(allowed_extensions)}" --> 368 raise FileNotFoundError(error_msg) 369 return out 370 FileNotFoundError: Unable to find 'https://huggingface.co/datasets/kili-technology/plastic_in_river/resolve/main/train.csv' submitted by /u/0ni0nrings [link] [comments]  ( 9 min )
    [D] How do byte-level language models work?
    I've recently been trying to pre-train my own small language model on the tiny-series datasets on huggingface. I also wanted to use a model similar to MEGABYTE but I don't understand how using bytes would work. The only implementation I could find from lucidrains used str(chr(max(32, token))) to decode any token (byte) to a character and put the embedding size as 256. Firstly, why 256 and not 256-32 as any values below 32 are ignored? Also, many byte-level models including this and ByteT5 mention that they can process any text sequence even in a multilingual setting, however how would that be true if we are only using one byte, would we have to move to 2 bytes or use an UNK token, and if we did use 2 bytes that would make our embedding size around 65000 which defeats sort of the point as o…  ( 10 min )
    [P] Evaluating and tuning a model when the population may change YoY and best practices for mitigating overfitting on features that correlate with time.
    Consider a predictive model that is predicting if an outcome Y will occur in Q1 2023, based on data from Q1 2022. Now, if want to predict outcomes for 2024, we must use last years data to build the model, but we are going to have some bias if there are features that vary year over year. Is the best approach in such a situation to try and tune/validate the model with other years in the hopes of mitigating any features that are correlated with a specific year? Any help would be much appreciated, as I can't find agreed upon methods. submitted by /u/unga123 [link] [comments]  ( 9 min )
    Is there a model to input anecdotal text stories as training data to return a more comprehensive story? [P]
    I have a goal and am looking for direction from others who know more than me about machine learning. I want to submit 5-10 pieces of text to a model. The text will be anecdotes from a common experience but each one from a different person’s perspective. For example, if a family visits a theme park, each family member will have a story or two about the day. Each family’s story would be a submission to the model. One person might have loved the roller coaster and can tell about the exciting parts. Another person maybe just can’t stop talking about how great he food was. Someone else maybe felt sick and complains the line at the bathroom was too long. Perhaps another family member also rode the same roller coasters as the first person but instead hated it, so would have a very different description of it than the first. All these anecdotes are submitted to the model. Then, the model can be queried. Such as, “Tell me about the theme park.” or “I love roller coasters. Tell me about the theme park.” or “I tend to overeat, tell me about the theme park.” (the model wouldn’t hype of the food, maybe it would talk about how much exercise the visitors get by walking around all day.) In this case of a theme park context, the model would have a preconception of a theme park. It would know the general concept, know of several examples or standards that it could compare this theme park against, understand it’s all for fun, etc. This type of model may be available as an API or model already and I just don’t know about it. That’d be fine, please point me towards it. Or, maybe there’s something already available but would need tweaked or customized. submitted by /u/Semper_Disco [link] [comments]  ( 10 min )
    [D] Help me learn ML easily specially in model building and EDA
    Can you give easy to understand sources and hands-on practice methodology to master ML? Help me understand build the models in and out . Thank you submitted by /u/the_mystic_1 [link] [comments]  ( 9 min )
    NSF workshop on LLMs in chemistry education [R]
    Over Feb 12-13 of 2024, the National Science Foundation (NSF) is sponsoring a workshop titled “Integrating LLMs into the Materials Chemistry Curriculum” in Golden, Colorado. We aim to explore and develop innovative ways to incorporate large language models (LLMs, e.g. GPT, ChatGPT, and Bard) into upper division chemistry laboratories and virtual lab experiences. During the workshop, participants will brainstorm and create demonstrations incorporating LLMs into the curriculum. The event will bring together folks across academia and the private sector with disciplinary backgrounds that range across chemistry, computer science, materials science, physics, and education. There is no registration fee, and we anticipate being able to cover the majority of participant travel costs thanks to NSF support. Participants early in their career (i.e., graduate students, postdoctoral scholars) are particularly encouraged to apply. If you are interested in participating in this workshop, please fill out the Google form (link below). Please feel free to distribute this invitation widely. Application: https://forms.gle/P9QdNiCuaUAHFZj29 submitted by /u/KC2792 [link] [comments]  ( 9 min )
    [P] Where to find projects to contribute to?
    Hello, I'm a developer with 6 years of experience in the mobile field, and I recently completed my master's degree in artificial intelligence (Text mining). I want to transition into the field of AI, but I need more experience with projects in the "real world," outside of academia, and I'd like to contribute to an open-source project. I looked on Github, but I ended up feeling confused and not sure where to start. P.S.: I did some research in this subreddit, but the posts about contributions seemed a bit dated. submitted by /u/Substantial_Fact_205 [link] [comments]  ( 9 min )
    [P] Image based Python + OpenCV automation, MMORPG Laghaim Auto-Fighter Bot Demo
    Video: https://youtu.be/0m12vkaoE7w ​ Detailed Medium post will follow in the upcoming days. https://medium.com/@pssdplayer submitted by /u/HistorianCrafty3514 [link] [comments]  ( 9 min )
    [D] - I have 20-30 million shopify products dataset, any ideas?
    I have collected over 20 million shopify products & had the following ideas for them: - LLM ( Finetune an llm to know how to speak ecom ) - Video bot that can make videos on those products, using their description, elevenlabs & AIFaceGen - EcomStore that will markup the products about 30% ( This will need the bot to frequently scrape, to ensure that the products are up to date ) - Selling the dataset based on fragments, like 1$ per 1k-10k records, depends on what sells. Please let me know if these are good ideas, and if someone would like to support / help me in any way ( I just need to selfhost my supabase instance, & add all the products to it & then dev can get started ) submitted by /u/AdonisCodes [link] [comments]  ( 9 min )
    [D] Best open-source AI model for QA generation from context
    As the title says I’m looking for an open-source AI model for generating question-and-answers with a correct answer option and explanation to the correct answer from the input context. So far I have tried these models, TheBloke/Llama-2-7B-GPTQ TheBloke/Llama-2-13B-GPTQ TheBloke/Llama-2-7b-Chat-GPTQ (the output is not consistent. Sometimes I get an empty response or without the correct answer option and an explanation data) TheBloke/Llama-2-13b-Chat-GPTQ (even 7b is better) TheBloke/Mistral-7B-Instruct-v0.1-GGUF(so far this is the only one that gives the output consistently. But not able to generate more than 2 QA due to max token limit of 512. Even tried setting the max token as 1024, 2048 but nothing helped) TheBloke/Mistral-7B-OpenOrca-GGUF NousResearch/Llama-2-7b-chat-hf My system configurations are: Windows 10 with 16GB GPU Additional Information: The input prompt token will be around 250-350 tokens per request. submitted by /u/gokulcv [link] [comments]  ( 9 min )
    Churn Prediction [R]
    I want to build a model to predict churn in a third party logistics company. What variables should make up my data? Any help would do. Thanks submitted by /u/DisastrousAd8814 [link] [comments]  ( 9 min )
    [D] Recommendations for CPU-Based Real-Time Vector Database Indexing and Matching?
    Hello everyone, I have a specific online vectorization use case: I'm looking to search the internet for articles, vectorize these articles along with the search queries, and then retrieve the most relevant passages from them. Currently, I have basic hosting through DigitalOcean. Could anyone recommend the most suitable vector dataset for this task? Additionally, considering my resources, is it feasible to run this system solely on CPUs? And if so, would this setup be scalable if deployed on CPUs only? submitted by /u/Traditional-Poet2746 [link] [comments]  ( 9 min )
    [R] network digital twin for cybersecurity
    Hi all, for a text work of mine I am trying to do a project based on generating digital twin of networks. My goal is to create a digital twin of a network and then work on it from a cyber security point of view. I will briefly explain what I would like to do. I am currently using software for network vulnerability scans (OpenVAS). I use this software to perform network vulnerability scans at the network level, so basically to OpenVAS I pass a network (for example 192.168.xx.xx/24) to automatically identify all the vulnerabilities that are there. The next step ( what I'd like to do and that's why I'm asking for your advice) is to create a digital twin of the newly scanned network and then perform a penetration test on this digital twin of the network, without going to stress the actual network. Ideally, I would like to pass the output of the OpenVAS vulnerability scans, routing rules, and firewall rules to some tool that will then generate for me the digital twin of the network, which will then be used for offensive cybersecurity, so exploits, privilege escalation, etc.... will be tested on this digital twin without worrying about breaking some kind of service or stressing the real network. What I am asking is, do you know of any tool that would do the trick for me? So some tool that allows me to generate a digital twin of a network by providing as input vulnerability scans (xml,json,csv etc...), routing rules, firewall rules, pcap traces etc... Do you have any references or documentation? Are you aware of any open source tools? I thank you for your helpfulness! ​ submitted by /u/Salt-Arugula-8128 [link] [comments]  ( 9 min )
    Best approach for VFX lineups using ML [Project]
    Quick intro Lineups are one of the first steps in the VFX pipeline Source: - orignal footage that was shot on set - a reference (quicktime) video from the film edit. Task: The reference shows modifications to the original footage. They can be : - timewarp (either fixed retimes like 200% speed or completely random) - transform (moved the image in x/y axis, rotation, scale, etc.) So the lineup task is to align the original footage to the reference quicktime. What I did so Far: Made a simple script in the software Nuke, using some Python and readily available tools to make it work on a simple shot. General logic is compare every frame and the associated one is the frame with the least difference between the two. This works on super simple and straightforward tasks. (can provide more info if needed). Issue: Some references are more heavily modified. They can have some muzzle flash, basic 3d objects or even some slight error introduced like a distortion applied to the image when none shouldn't so it will never be perfectly aligned. This makes the difference of the full frame higher for some frames, making the lineup wrong. (it will take the wrong frame that has no muzzle flash, because it has less difference...)Some other things to consider is that watermarks are covering the ref and the colors are not perfectly matching, can get them close enough, but there's a difference. Conclusion: Because of those issues, I'm thinking about using Machine Learning. I have next to no knowledge on the subject. I know there Is a bunch of ways to train a model, but no clue where to start, so here's my question : Which learning styles has the best potential to be able to solve this task? submitted by /u/Pretty_Customer_8113 [link] [comments]  ( 9 min )
    [R] What are some interesting research topics to study in the intersection of ML and signal processing currently?
    I will have to pick and start a research project next January for my final year. So wanted to start exploring now. I want to do something substantive and interesting enough to get published. submitted by /u/BadMeditator [link] [comments]  ( 9 min )
    [R] Mistral 7B
    submitted by /u/hardmaru [link] [comments]  ( 8 min )
    [R] Tsinghua University: Inverting Transformers Significantly Improves Time Series Forecasting
    Transformers are great at NLP and computer vision tasks, but I was surprised to learn they still lag behind simple linear models at time series forecasting. The issue is how most Transformer architectures treat each timestamp as a token and fuse all the variable data from that moment. This makes two big problems: Variables recorded at slightly different times get blurred together, losing important timing info Each token can only see a single moment, no long-term dependencies So Transformers struggle to extract useful patterns and correlations from the data. Some researchers from Tsinghua University took a fresh look at this and realized the Transformer components themselves are solid, they just need to flip the architecture for time series data. Their "Inverted Transformer" (or iTransformer): Makes each variable's full history into a token, instead of each timestamp Uses self-attention over variables to capture relationships Processes time dependencies per variable with feedforward layers This simple tweak gives all the benefits we want: State-of-the-art forecasting accuracy, beating both linear models and standard Transformers Better generalization to unseen variables Increased interpretability Ability to leverage longer historical context TLDR: Inverting Transformers to align with time series structure allows them to outperform alternatives in working with time series data. Full summary. Paper is here. submitted by /u/Successful-Western27 [link] [comments]  ( 9 min )
    [R] How to train multiple models on multiple GPU's simultaneously
    Hi! The task is to train N TensorFlow/Keras models using [2, ... N] GPU's on K different datasets in parallel. It is for testing a custom pipeline, you create a pipeline, you run it on multiple different datasets and get an aggregated metric. For now I'm using a for loop but how do I do it in parallel e.g. on AWS? I googled, but surprisingly haven't found a lot of results. I looked at Apache AirFlow because I'm vaguely familiar with it but so far I couldn't get a definite answer on how it works with multiple GPU's. Second option I found is to use Ray library. Is it worth trying? What should I use to solve this task? Thanks. UPD. I'd also consider a PyTorch solution as a backup option. UPDUPD. Jesus, why Reddit removing newlines after edit? submitted by /u/Disastrous_Sky9468 [link] [comments]  ( 9 min )
    [D] How important is having a great team when ML solutions are slow to be adopted ? When to move on?
    My team and managers are so easy to be with. Very grateful for that. The pay is okay. 150k/yr TC in Midwest. Hard for me to make a switch given how much I am appreciated. I almost feel spoiled when it comes to flexibility. I have overachiever tendency and the pace is so slow in adopting my ML models. I am the “lead”/senior data scientist in an R&D supporting scientists decision making with machine learning. Importantly, I am in a huge multinational consumer product company and I am not in the Data science organization, I bridge between the two and the data science expert on the team. I have developed the domain expertise and I have a PhD in an applied computational field with 5 years experience . I am not as challenged with getting deeper into complex stats, I have been really honing the soft skills of communication, influencing etc so getting comfortable in a senior role. Also I have been growing as a ML engineer building my own pipelines and deploying my models on prem server that they bought for me. I am not sure how greener it is on the other side, how do senior folks approach deciding when to move on? Any input is much appreciated. submitted by /u/Diligent_Trust2569 [link] [comments]  ( 9 min )
    [D] [P] [R] What to do when your model isn't testing well?
    I have 200k observations overall. I split my data into training and test set. My target variable has low prevalence ~ 9% so I tried random oversampling, random undersampling and SMOTE. After I fit my models, I tested them on my training test and the results were awful. I mean I've never had a model with 50% roc-auc, but then again, I rarely developed ML models. I'm wondering what the next steps would be? I understand there could be some sort of overfitting. But what would you do next? Any references would be appreciated :) submitted by /u/Actual-Muscle-9846 [link] [comments]  ( 9 min )
    [D] Fastest lipsync projects?
    Given an image, and an audio file (TTS generated), what is current fastest library that can output me a video of a talking image with the audio on it? I have made some research and I have seen Wav2Lip and SadTalker. Any better options? I am looking for processing speed and for the lesser hardware intensive solution for a side project. Thanks! submitted by /u/reddit2vid [link] [comments]  ( 9 min )
    [P] LoopQuest, A Github-like platform to host simulation environments for AI training
    Hello everyone! Here is my pet project, https://www.loopquest.ai/. I am trying to build a platform like Github to let people upload their simulation environments so people can train their AI agents by interacting with the environments created by others. Here is a 2-min demo, https://youtu.be/d53NFjkU7JA. It is not launched yet but would love to get some early feedbacks. Here is the corresponding Github repo https://github.com/LoopMind-AI/loopquest. For now, the package can log env-agent interaction data by adding one extra line of code. You can think of it similar to https://github.com/google-deepmind/envlogger but with much better backend and frontend support. Any feedbacks are appreciated :) submitted by /u/jxx123 [link] [comments]  ( 9 min )
    [D] Why async gradient update doesn’t get popular in LLM community?
    The pipedream-2bw paper and the Zero-offload paper both show that 1-step delayed asynchronous gradient update doesn’t affect the convergence (and perplexity) while improve the training efficiency (by fully utilize the bubbles in pipeline parallelism) at a large margin. However, both the Megatron-LM and the DeepSpeed don’t use pipedream-2bw scheduling. Could anyone share me some insights or ideas about why such an efficient scheduling scheme doesn’t get popular in the LLM pretraining community? Does it suffer convergence/accuracy issue in practice? Or are there any other concerns that blocking it become the default / most popular pipeline parallelism scheduling? (I posted the same question in hacknews as well: Why async gradient update doesn't get popular in LLM community? | Hacker News) I have tried to implement the pipedream-2bw scheduling scheme on Megatron-LM and do can reproduce the performance gain as well as loss convergence with GPT-2 345M using 8xV100 GPUs: https://github.com/sighingnow/Megatron-LM/blob/ht/dev-pipe/megatron/core/pipeline_parallel/schedules.py#L1421 submitted by /u/sighingnow [link] [comments]  ( 9 min )
    [D] IDE?
    What’s the best IDE to work with or is it on user needs that determines best fit or is their one top dog and dominator that can robustly if not better preform other IDE’s ? submitted by /u/External_Age_5855 [link] [comments]  ( 9 min )
  • Open

    Neural Networks From Scratch in Rust
    submitted by /u/zezeartix [link] [comments]  ( 8 min )
    Activation function for generating Shapley values
    Hi, I want to train a neural network to calculate Shapley values based on a given characteristic function. Depending on a given characteristic function, calculated through a dedicated algorithm, Shapley values can be any number, positive or negative, without a set range. Because of this, I am unsure, for the specific application of calculating Shapley values, what activation function to use in a neural network that would calculate them. The relu function, as well as leaky relu function, either cannot give values that are negative or have trouble giving large negative values, and sigmoid or tanh can only give values in a certain range. I am aware that there are other commonly used activation functions, but all the ones I could find had one of these issues, which would make training a network to calculate Shapley values difficult. Any advice? submitted by /u/PowNotBigSurprise [link] [comments]  ( 9 min )
    A hugging face implementation for style gan to produce user avatar
    I was thinking to create an app based on style gan which will include facebook , instagram theme and style transfer it with profile pic so shall i create this app or not .I want to know if it will be good idea. submitted by /u/No_Claim_8651 [link] [comments]  ( 9 min )
  • Open

    Improve performance of Falcon models with Amazon SageMaker
    What is the optimal framework and configuration for hosting large language models (LLMs) for text-generating generative AI applications? Despite the abundance of options for serving LLMs, this is a hard question to answer due to the size of the models, varying model architectures, performance requirements of applications, and more. The Amazon SageMaker Large Model Inference […]  ( 13 min )
    Index your web crawled content using the new Web Crawler for Amazon Kendra
    In this post, we show how to index information stored in websites and use the intelligent search in Amazon Kendra to search for answers from content stored in internal and external websites. In addition, the ML-powered intelligent search can accurately get answers for your questions from unstructured documents with natural language narrative content, for which keyword search is not very effective.  ( 7 min )
  • Open

    Python code for means
    The last couple article have looked at various kinds of mean. The Python code for four of these means is trivial: gm = lambda a, b: (a*b)**0.5 am = lambda a, b: (a + b)/2 hm = lambda a, b: 2*a*b/(a+b) chm = lambda a, b: (a**2 + b**2)/(a + b) But the arithmetic-geometric mean […] Python code for means first appeared on John D. Cook.  ( 5 min )
    More ways of splitting the octave
    in an earlier post I said that the arithmetic mean of two frequencies an octave apart is an interval of a perfect fifth, and the geometric mean gives a tritone. This post will look at a few other means. Intervals The harmonic mean (HM) gives a perfect fourth. The arithmetic-geometric mean (AGM) gives a pitch […] More ways of splitting the octave first appeared on John D. Cook.  ( 6 min )
    Maclaurin’s inequality
    This afternoon I wrote a brief post about Terence Tao’s new paper A Maclaurin type inequality. That paper builds on two classical inequalities: Newton’s inequality and Maclaurin’s inequality. The previous post expanded a bit on Newton’s inequality. This post will do the same for Maclaurin’s inequality. As before, let x be a list of real […] Maclaurin’s inequality first appeared on John D. Cook.  ( 5 min )
    Newton’s inequality and log concave sequences
    The previous post mentioned Newton’s inequality. This post will explore this inequality. Let x be a list of real numbers and define Sn(x) to be the average over all products of n elements from x. Newton’s inequality says that Sn−1 Sn+1 ≤ S²n In more terminology more recent than Newton, we say that the sequence […] Newton’s inequality and log concave sequences first appeared on John D. Cook.  ( 5 min )
  • Open

    Research Focus: Week of October 9, 2023
    Research Focus: Principal researcher Lester Mackey recognized for pioneering statistical and ML techniques; Pareto frontiers in neural feature learning; structural inequality in the influencer industry; new research on cardinality estimation. The post Research Focus: Week of October 9, 2023 appeared first on Microsoft Research.  ( 9 min )
  • Open

    Take the Wheel: NVIDIA NeMo SteerLM Lets Companies Customize a Model’s Responses During Inference
    Developers have a new AI-powered steering wheel to help them hug the road while they drive powerful large language models (LLMs) to their desired locations. NVIDIA NeMo SteerLM lets companies define knobs to dial in a model’s responses as it’s running in production, a process called inference. Unlike current methods for customizing an LLM, it Read article >  ( 6 min )
  • Open

    Gain and bias params in Mujoco
    Hi! I'm new to Mujoco and robot dynamics. When I read the Mujoco document, I'm confused about the gainprm and biasprm parameters. I want to understand the meaning of these parameters and tune the actuation speed of my actuator. An easy-to-understand explanation or supporting material would be appreciated. Thanks in advance. submitted by /u/UpperSearch4172 [link] [comments]
    LoopQuest, A Github-like platform to host simulation environments for AI training
    Hello everyone! Here is my pet project, https://www.loopquest.ai/. I am trying to build a platform like Github to let people upload their simulation environments so people can train their AI agents by interacting with the environments created by others. Here is a 2-min demo, https://youtu.be/d53NFjkU7JA. It is not launched yet but would love to get some early feedbacks. Here is the corresponding Github repo https://github.com/LoopMind-AI/loopquest. For now, the package can log env-agent interaction data by adding one extra line of code. You can think of it similar to https://github.com/google-deepmind/envlogger but with much better backend and frontend support. Any feedbacks are appreciated :) submitted by /u/jxx123 [link] [comments]

  • Open

    [D] On-Chain Reputation Model
    I am relatively new to machine learning, and I am thinking about building an on-chain reputation ML model. Here is how far I have gone in my ideation phase, can someone help with some suggestion on how I can approach this issue. Input data could include on-chain activity like number of transactions, value transferred, smart contracts interacted with, tokens held, NFTs owned, etc. Additionally, data from off-chain sources could be incorporated like identity verification, credentials, ratings, reviews, social media profiles, etc. Supervised learning algorithms like regression or classification models could be used to predict a reputation score. The target variable would be some verified reputation rating. Models like linear regression, random forests, or neural networks could work. Choice depends on size of data and complexity needed. Model would need to be transparent and parameters verifiable on-chain for validity. So linear models or simple neural networks may be most practical initially. The model could be trained off-chain initially but ultimately parameters and logic stored on-chain. Predictions could also be verified on-chain. Careful feature selection is important so the model relies on signals that are resistant to manipulation and capture true reputation. The model would need continuous updates as new data comes in reflecting latest reputation. This would require clear on-chain governance. Issues like privacy, collusion resistance, and censorship resistance would need to be addressed through crypto mechanisms like zero-knowledge proofs. P.S. This is a personal project I want to attempt to level up my ML skills. submitted by /u/AdParticular2891 [link] [comments]  ( 9 min )
    [D] Pivoting jobs to ML
    Hi everyone, I recently started a job as a Junior Data Engineer. I have learned a lot so far working with DBT, Snowflake, Looker, Jira workflow, and Git using SQL and Python. I plan to stay at this company for 2 years. My boss has assured me that if I work hard I will progress from a Junior to full Data Engineer. After 2-3 years as a DE, I want to level up and move towards Data Science/ ML roles. My questions are: What other skills should I learn to enable me to pivot into something ML related? Should I find a job as a Data Scientist first, then try for ML jobs? Just looking for some advice/suggestions. Thanks! submitted by /u/SydeFxs [link] [comments]  ( 9 min )
    Problem solving in programming [D]
    Hello Redditors, I am a student who is currently studying Bachelor of Science in AI. I have a question regarding improving my coding skills. I am aiming for a research internship and I don't know where to start. I previously took a summer school that taught me a lot about state-of-the-art models such as GANs, Transformers, VAEs, GNNs, etc. I would like to improve my coding skills, specifically problem-solving and writing clean code. I have experience with deep learning in general and data analysis. I am looking for a research internship next summer. Where should I start? I plan to review some of the deep learning material in the Deep Learning Specialization before taking the GAN specialization. However, when it comes to coding, I want to think like a software engineer or a great programmer. What do you guys suggest for improving my coding or problem-solving skills? I'm feeling confused with multiple resources and I don't know where to begin. I’d really appreciate your help. submitted by /u/misplacedlion [link] [comments]
    Random forest trained on insider trades [D]
    Would be very appreciative if someone looked at these results and pointed out potential / actual flaws. Dataset basics: insider trade details, insider trades over the last month, insider trades over the last week, (…) stock return over the last month (…), 46 columns total. Labels… 0: -5% + 5% Dates predicted: reported date. Usually 2-3 days behind transaction. Also, not positive if results are significant in the first place so that would be a great call out as well. Colab notebook: https://colab.research.google.com/drive/1fO1hVsVMWN3TORNj4OQn5UbWQOeug4fi?usp=sharing submitted by /u/This_Cardiologist242 [link] [comments]  ( 9 min )
    [R] ALMT: Using text to narrow focus in multimodal sentiment analysis improves performance
    Multimodal sentiment analysis combines text, audio and video to understand human emotions. But extra inputs can add irrelevant or conflicting signals. So filtering matters. Researchers made a "Adaptive Language-guided Multimodal Transformer" (ALMT) that uses text to guide filtering of visual and audio data. This creates a "hyper-modality" with less noise that complements the text. They tested it on datasets like MOSI (YouTube reviews), MOSEI (YouTube clips) and CH-SIMS (Chinese videos). ALMT achieved improved accuracy: MOSI: YouTube movie reviews with 2,199 samples. ALMT achieves state-of-the-art performance on various metrics including 6% higher 7-class accuracy. MOSEI: 22,856 YouTube clips covering sentiment-rich scenarios. ALMT improves multi-class accuracy by 3-5% over previous methods. CH-SIMS: Chinese dataset with over 2,000 video samples. ALMT surpasses prior work by 1.4% in binary accuracy. Analyses showed big drops in performance without the guided filtering, so this validates that it's the main innovation. Downsides are it needs lots of training data and has minor gains on sparse regression metrics. But overall the technique of filtering multimodal data under text guidance gives improvements. The concepts feel intuitive - use dominant signals to filter others and retain useful complements. My guess is it would transfer well to other multimodal tasks. TLDR: New way to filter multimodal data for sentiment analysis using text guidance improves performance. Shows the value in removing distracting signals. Sometimes less is more. Full summary here. Paper is here. submitted by /u/Successful-Western27 [link] [comments]  ( 9 min )
    Has anyone evaluated Tiktok's algorithm for their recsys use case? [D]
    As a disclaimer, I am not familiar with many Recsys benchmarks. So I know Tiktok published a white paper on their purported algorithm, Monolith, but it is unclear if that is what they use in their products or not. Given, recommender systems seem to be core to Bytedance's business, I imagine they wouldn't provide many details. Has anyone evaluated Monolith on their own products and seen an improvement? I think the app is impressive and am wondering how it has transferred to other use cases. ​ submitted by /u/HybridRxN [link] [comments]
    [P] Optimistix, nonlinear optimisation in JAX+Equinox!
    Hi everyone! I wanted to advertise my new JAX optimisation library Optimistix! Optimistix has high-level APIs for minimisation, least-squares, root-finding, and fixed-point iteration and was written to take care of these kinds of subroutines in Diffrax. Here is the GitHub: https://github.com/patrick-kidger/optimistix The elevator pitch is Optimistix is really fast, especially to compile. It plays nicely with Optax for first-order gradient-based methods, and takes a lot of design inspiration from Equinox, representing the state of all the solvers as standard JAX PyTrees. For those familiar with classical nonlinear unconstrained optimisation, Optimistix does some pretty nifty new things. It introduces new abstractions for modular optimisers, allowing users to mix-and-match different optimisation techniques easily. For example, creating a BFGS optimiser with Levenberg-Marquardt style Tikhnov regularisation takes less than 10 lines of code in Optimistix. I'm using Optimistix as a tool for my own research, and continue to work on it as part of my PhD (supervised by Patrick Kidger.) I would love for some more people to try it, so let me know what you think! submitted by /u/packquickly [link] [comments]  ( 9 min )
    [D] Document layout - recreating the structure
    Hello, Document layout analysis has been a great tool so far to extract the components of a document (title, paragraph, tables ...). I'm working on long text PDF which are mostly scanned documents. One of the process involved after document layout analysis, is to recreate the document structure: creating sections, sub section, sub sub sections and so on. As of today, this task is done by parsing the title and finding out any ordering information (numeric, alphabetical or roman notation): 1. Title A 1.1 Title B 2. Title C 2.a) Title D This technique works only if a document follows this constraint (numeration). I want to go one step further, where the algorithm could create the document structure with any title ordering information. I believe that relying only on parsing cannot do the trick. What could be the options? Given that the only features are: title's text and title's position (x,y) in the document. I was wondering if a model like a seq2seq could fit this problem, or should I stick with an engineering rule based approach. Thanks ​ submitted by /u/mathrb [link] [comments]  ( 9 min )
    [R] Is there an enstablished method to test if something has been memorized / seen by black-box LLMs?
    I am using ChatGPT and other LLMs for which the training data is unknown. I am using them to test a set of MC question from a medical test published after the models knowledge cutoff. However, I cannot be 100% sure the questions were not on the internet beforehand. Is there any established method or testsuit to try to understands weather a given instance has been seen at training time? All I can think is looking at memorization or at perplexity, but I was looking for a more out of the box methodology that people use. It seems to me that the problem is quite general. Thanks! Edit: I know LLMs do not just memorize things and learn pattern. However, there is research on trying to understand if a datapoints has been used in training or not. Eg there is research that tries to exploit the fact that seen text has normally lower perplexity than unseen text or other similar infornation. I was wonderibg what the state in this topic is and if something is normally used as a score to have some clues. I do not expect to be able to retrieve the exact same questions lol submitted by /u/ombelicoInfinito [link] [comments]  ( 9 min )
    [D] Extracting Multi-modal embeddings (Image + text) to be used for visual similarity purposes
    I am looking for methods/frameworks to extract multi-modal embeddings from images and text for similarity search purposes. The problem setup is slightly different from how CLIP style methods are generally used ( where similarity between text and image embeddings obtained through the model are computed to assess how similar a caption is to an image). My intended application is similarity search, where I want to find entries of images and captions pair similar to a piece of the query image and caption encoded together. Some approaches I tried: I tried concatenating the textual and visual embeddings obtained from CLIP and ResNET with textual embeddings and using it with cosine similarity, but it had limited utility. My guess is that concatenating two modalities merely without any training would yield very little utility. The next direction could be to train a model to fuse the embeddings obtained, but my dataset size is really small (10 thousand total), so not sure if training a model would be helpful. Are there any approaches that can allow me to combine the multi-modal embeddings for similarity purposes, similar to how pre-trained ResNET or Inception can be used off-the-shelf for retrieving visually similar images? Any pointers/advice would be greatly appreciated. submitted by /u/No-Commission3556 [link] [comments]  ( 9 min )
    [P][D] Building Datasets
    In my ML/AI journey up until now, most training and hands-on labs either use a pre-built dataset or have you build a pretty simple and flat dataset. I am now looking to stretch my exploration into some real-world use cases and find the data I want is way more complex. Researching online feels like the meme on learning to draw an owl. So I'm looking for some guidance on how to handle my data. The data is an array from a rest API that includes all alarms from an application as nested objects. So the data looks like this for a single event: data = { "event_data": [ { "root_cause": "Root cause added after API calls" "alarms": [ { "alarm_id": "alarm_id", "alarm_name": "alarm_name", "alarm_type": "alarm_type", "alarm_description": "alarm_description", "alarm_details": { pro1: val1, prop2: val2, etc... }, "actual_alarm_value": { any_random_key: "any_random_value", etc... }, } ], } ] } I need to build a dataset that includes many of these events with the ultimate goal of predicting future events. I plan to test this against various ML models and LLMs. Each event would be a single row, and I would flatten out each alarm so each nested property has its own column. Where I need clarification is how to handle the flatting of alarms. If I fully flatten them, it appears like I lose the context of the alarm's parent event. But if I only flatten them to the alarm level, I lose each property having its own row Also actual_alarm_value is very random, so my thinking is to use string encoding here. I know this is a lot of detail, and I appreciate any and all advice and help in learning how to do this. submitted by /u/that1guy15 [link] [comments]  ( 9 min )
    [D] Is there a REST API for text embeddings?
    I'm aware there are commercial offerings like OpenAI and cohere with the embedding API. But what about for open source models like the ones from SentenceTransformers? I'm aware you can use the HuggingFace inference API, but it's probably not best for commercial use, in which case the Inference endpoints would be better, but it's quite pricey for a startup with no customers. I also know I could use some kind of serverless GPU / inference platform to create my own API. But is there just a straight-up REST API for getting text embeddings from a model via SentenceTransformers or other HuggingFace models? submitted by /u/TheSaasDev [link] [comments]  ( 9 min )
    [D] Langauge Confusion.
    I am a Second Year Student I'm planning to start learning ML which obviously requires python. But at the same time I wanna start practicing DSA / competitive programming as well. I'm sorta in this dilemma of what to do. Since python is a must for ML I'm 100% doing it, but for DSA I am confused whether I should learn DSA in Python or C++. People say C++ is the best and ideally I should do that. But python suits my need more. Obviously I don't mind doing both languages together but it seems a bit redundant. P.S: I'm learning DS basics in college via C language so learning the basic concepts isn't an issue. What do you suggest? submitted by /u/No-Discipline-2354 [link] [comments]
    [Project] I created a tool that navigates the Internet and scrapes data using GPT-4
    Hi! I created a universal data API that uses headless browsers and GPT to extract any data from the web in JSON format. I started this project because I needed some API to do data enrichment to get company data (headcount, investment rounds, etc.). Once I did the first version, I quickly realized that there can be many use cases for such a tool: data enrichment, web scraping, data validation, etc. You can get the early access to the API here: https://singleapi.co/ Thanks! submitted by /u/semanser [link] [comments]  ( 9 min )
    Applied AI/ML/ Data Science MS in Germany [D]
    Hey folks, I graduated from a tier 2 college in India with an ECE degree and then started working as an ML engineer in a mid-size startup 2 years ago. (1 year of internship + 1 year of Full time employment at the same company). Now, I am looking to get a Master's Degree in AI/ML/DS in Germany starting Winter 2024. I am a person with interests in Industry skills(Applied AI/ML) rather than the research/academia part as I don't wish to pursue a PhD nor do I want to be stuck in a Math-deep subject that may not be relevant for me in the future. On account of this, I wanted to know which college/degree offers the best balance in-between theory and applied AI/ML/DS. Also, people have been telling me that exams are super tough and it is hard to successfully complete an AI/ML/Data Science MS degree in Germany, Is it true? It has been super discouraging for me to hear this and is affecting me mentally to go through the application process. PS. CS/Electrical Degrees with good electives for AI/ML/DS are also good enough for me (Just hoping the coursework/grading is not too harsh) Also, it would be great if someone could clarify if an Electronics and Communications student can apply for a CS degree in Germany. Sorry for asking too many questions, TIA. :) submitted by /u/TheDivineKnight01 [link] [comments]  ( 9 min )
    [D] Prompting as searching through a space of vector programs
    Enlightening article from Francois Chollet about #LLMs and embeddings "Prompt engineering is the process of searching through program space to find the program that empirically seems to perform best on your target task." ​ https://fchollet.substack.com/p/how-i-think-about-llm-prompt-engineering submitted by /u/alexisperrier [link] [comments]  ( 9 min )
    [D] Best approach to verify 4 million sentence-named entity pairs ?
    I have a dataset of about 4 million pairs of sentence-named entity. Looks like this: Sentence: MarketWatch has reached out to Charles Schwab and GQG for comment. Corresponding NER Tags: [{'end': 6, 'entity': 'B-ORG', 'index': 1, 'score': '0.98322886', 'start': 0, 'word': 'Market'} {'end': 7, 'entity': 'I-ORG', 'index': 2, 'score': '0.969261', 'start': 6, 'word': '##W'} {'end': 11, 'entity': 'I-ORG', 'index': 3, 'score': '0.97644824', 'start': 7, 'word': '##atch'} {'end': 38, 'entity': 'B-PER', 'index': 8, 'score': '0.9927636', 'start': 31, 'word': 'Charles'} {'end': 41, 'entity': 'I-PER', 'index': 9, 'score': '0.99394774', 'start': 39, 'word': 'Sc'} {'end': 44, 'entity': 'I-PER', 'index': 10, 'score': '0.41437265', 'start': 41, 'word': '##hwa'} {'end': 45, 'entity': 'I-PER', 'index': 11, 'score': '0.46933985', 'start': 44, 'word': '##b'} {'end': 51, 'entity': 'B-ORG', 'index': 13, 'score': '0.9984176', 'start': 50, 'word': 'G'} {'end': 52, 'entity': 'I-ORG', 'index': 14, 'score': '0.99367344', 'start': 51, 'word': '##Q'} {'end': 53, 'entity': 'I-ORG', 'index': 15, 'score': '0.99617106', 'start': 52, 'word': '##G'}] What would be a good approach to verify the correctness of each item? submitted by /u/shardblaster [link] [comments]  ( 9 min )
  • Open

    MusicGPT: Create unique music from text prompts
    submitted by /u/SaucySporky [link] [comments]
    Website to do the Following: I Give it a Design and Create an Image With it
    Hello all, I am not sure this is out yet. I would like to find a website where i can upload an image I own, and have it generate another image around it. Let's say I have some shirts that say 'HOLA'. I would want, for example, to generate an image of Socrates wearing said shirt. Is this possible? If so, which site would allow me to do this? ​ Cheers and merci! submitted by /u/JYanezez [link] [comments]
    So far, AI hasn't been profitable for Big Tech
    Big Tech companies like Microsoft and Google are grappling with the challenge of turning AI products like ChatGPT into a profitable enterprise. The cost of running advanced AI models is proving to be a significant hurdle, with some services driving significant operational losses. Corporate customers are unhappy with the high running costs of AI models. The nature of AI computations, which require new calculations for each query, makes flat-fee models risky. Some companies are trying to dial back costs, while others continue to invest more deeply in AI tech. Microsoft's GitHub Copilot, which assists app developers by generating code, has been operating at a loss despite attracting more than 1.5 million users. One of the reasons AI services are costly is that some companies have been reaching for the most powerful AI models available. Microsoft has been exploring less costly alternatives for its Bing Chat search engine assistant. Advances in AI acceleration hardware may eventually reduce the costs of operating complex models. Experts anticipate a more stringent financial approach in the near future, transitioning from experimental budgets to focusing on profitability. Source : https://arstechnica.com/information-technology/2023/10/so-far-ai-hasnt-been-profitable-for-big-tech/ submitted by /u/NuseAI [link] [comments]
    Dubbing By ElevenLabs. Share your fav videos in your native language!! Go try
    submitted by /u/ShooBum-T [link] [comments]
    The environmental impact of the AI revolution is starting to come into focus
    The environmental impact of the AI revolution is starting to become clear, with generative AI like ChatGPT increasing Google Search's energy use more than tenfold. The worry is that the computing power required for AI could lead to increased energy consumption and carbon footprint of data centers. AI already accounted for 10 to 15 percent of Google's electricity consumption in 2021. Google claims that the energy needed to power AI technology is increasing at a much slower rate than predicted, and they are implementing practices to reduce the carbon footprint of AI workloads. The worst-case scenario of Google Search using as much electricity as Ireland is unlikely, but the potential energy consumption of AI servers could grow significantly if AI's popularity continues to rise. Source : https://www.theverge.com/2023/10/10/23911059/ai-climate-impact-google-openai-chatgpt-energy submitted by /u/NuseAI [link] [comments]
    ALMT: Using text to narrow focus in multimodal sentiment analysis improves performance
    Multimodal sentiment analysis combines text, audio and video to understand human emotions. But extra inputs can add irrelevant or conflicting signals. So filtering matters. Researchers made a "Adaptive Language-guided Multimodal Transformer" (ALMT) that uses text to guide filtering of visual and audio data. This creates a "hyper-modality" with less noise that complements the text. They tested it on datasets like MOSI (YouTube reviews), MOSEI (YouTube clips) and CH-SIMS (Chinese videos). ALMT achieved improved accuracy: MOSI: YouTube movie reviews with 2,199 samples. ALMT achieves state-of-the-art performance on various metrics including 6% higher 7-class accuracy. MOSEI: 22,856 YouTube clips covering sentiment-rich scenarios. ALMT improves multi-class accuracy by 3-5% over previous methods. CH-SIMS: Chinese dataset with over 2,000 video samples. ALMT surpasses prior work by 1.4% in binary accuracy. Analyses showed big drops in performance without the guided filtering, so this validates that it's the main innovation. Downsides are it needs lots of training data and has minor gains on sparse regression metrics. But overall the technique of filtering multimodal data under text guidance gives improvements. The concepts feel intuitive - use dominant signals to filter others and retain useful complements. My guess is it would transfer well to other multimodal tasks. TLDR: New way to filter multimodal data for sentiment analysis using text guidance improves performance. Shows the value in removing distracting signals. Sometimes less is more. Full summary here. Paper is here. submitted by /u/Successful-Western27 [link] [comments]
    AI Political Analysis Test: Describe the commonalities between the Palestinians fighting for a homeland and the American Revolution.
    The Palestinians fighting for a homeland and the American Revolution have a number of commonalities, including: Fighting against a colonial power:** Both the Palestinians and the American colonists were fighting against a colonial power that had imposed its rule on them. In the case of the Palestinians, the colonial power is Israel, which was established in 1948 on land that had been inhabited by Palestinians for centuries. In the case of the American colonists, the colonial power was Great Britain. Fighting for self-determination:** Both the Palestinians and the American colonists were fighting for their right to self-determination, or the right to govern themselves. The Palestinians want to establish their own independent state, while the American colonists wanted to break away from Gr…
    I made "Pi: your personal IA" to have an opinion.
    submitted by /u/LonePrron [link] [comments]
    IBM CEO: Washington should hold tech firms accountable for AI
    submitted by /u/smo279 [link] [comments]
    Automated my Youtube Channel Using GPT 4
    Hi Everyone, I have automated the content creation for my youtube channel. It got total views of 8.5K and some videos getting 2.5K views. https://www.youtube.com/channel/UCG0-UemyRMUs1JJlQMK9lzA All the things are automated like:- Script Generation Voiceover Image Generation Subtitles I do minor tweaks here and there but majorly its automated. I posted is somwhere and people were commenting what's the use of the mindless videos? This is the begining, I want to automate the editing of videos. User can upload raw videos and I should be able to give multiple final edit videos. I have built a small tool blinkcuts.com, If anyone intersted. I can give access. Please DM for access. submitted by /u/raxrb [link] [comments]
    Saudi-China collaboration raises concerns about access to AI chips
    Saudi-China collaboration raises concerns about access to AI chips. The trial period includes complete digital access to FT.com with everything in both the Standard Digital and Premium Digital packages. At the end of the trial, users will be auto-enrolled in the premium digital monthly subscription plan for $69 per month. Payment can be made through credit card, debit card, or PayPal. Source : https://www.ft.com/content/2a636cee-b0d2-45c2-a815-11ca32371763 submitted by /u/NuseAI [link] [comments]
    Looking for the free AI tool which removed the noise from the video:
    Hey, I am looking for the free AI tool which removed the noise from the video. If there is any, do suggest. Thank You in Advance. submitted by /u/Haziq12345 [link] [comments]
    Looking for the free AI tool which removed the noise from the video:
    Hey, I am looking for the free AI tool which removed the noise from the video. If there is any, do suggest. Thank You in Advance. submitted by /u/Haziq12345 [link] [comments]
    How do AI-driven demand forecasting models handle market volatility and unexpected events, such as economic crises or pandemics?
    If you have any resources then do share. submitted by /u/Cygnet-Digital [link] [comments]
    AI Power Distribution Scenarios.
    submitted by /u/Philipp [link] [comments]
    As drone traffic increases, researchers turn to AI to help avoid collisions
    submitted by /u/Tao_Dragon [link] [comments]
    Is this a viable approach for a small plant manufacturing engineer?
    I'm a small plant engineer who covers manufacturing, process, quality, and new product design. I wear many hats in my job and it's a lot of responsibility. One way I've attempted to tame the complexity is by using good reference books. I've accumulated quite the collection through the years. Some print others digital. I've also got a lot of digital notes. And that's a lot of data. I've been playing around with sharly.ai (thanks to this sub for recommending) and uploading documents to it and querying them. Its been able to find the information every time it's been available. And more importantly it's provided sources and page numbers. This is important, since I've never been able to find a conversational AI that gives me consistently good answers (including the latest chatgpt), and I always need to read deeper. I also need to backup my work. So in this way it's basically a super index. I also bought a tablet for note-taking and basic sketches. The idea is to use the tablet to take notes, hold my library for reading, and interact with sharly.ai. Is this approach good enough, or is there something else I can do? submitted by /u/Aggressive_Ad_507 [link] [comments]
  • Open

    Issue with MuJoCo Simulation: Robot Penetrates the Ground
    Hello everyone, I'm working on simulating a modified humanoid robot, "DARwIn OP 3", using MuJoCo through dm_control in Python. My goal is to train the model to ascend stairs rapidly but these are the first steps. However, I've encountered a problem where the robot appears to sink into the ground and is then ejected with significant force under specific conditions. ​ https://reddit.com/link/174vpzw/video/u636vf49tftb1/player Environment: MuJoCo via dm_control. Issue Description: When the robot falls and its feet move, it behaves as though one of its motors sinks into the floor. Attempts: I've tweaked contact parameters and ground properties with no luck. Interestingly, this doesn't occur in the standalone MuJoCo simulator. Visual Aid: I've attached a video to illustrate the problem…
    Algorithms for average reward reinforcement learning in continuous/general state-action space
    I see that discounted reward reinforcement learning has been extensively studied in the literature. However, the average reward metric receives less attention, and it looks like algorithms for this metric (R-learning, H-learning, SMART, etc.) are much less than the discount metric. Could you suggest any algorithms for average reward reinforcement learning for continuous/general state-action space? submitted by /u/S1gnature [link] [comments]
    "How Disney Packed Big Emotion Into a Little Robot" (sim2real)
    submitted by /u/gwern [link] [comments]
    I took OpenAI's paper about defeating Dota2 world champions, and explained it paragraph-by-paragraph.
    submitted by /u/mngrwl [link] [comments]
    What's your view on the recent RT-X efforts/scaling via IL?
    With recent RT-X efforts from Deepmind, it seems the community has been shifting towards the development of a more generalized foundational model, combining with visions and languages, and scaling via imitation learning. I know RL algorithms are expensive to train and hard to scale due to the way the samples are generated, but I am still fascinated by the intelligence behind their philosophies. What do you think the future would look like? Like NLP or CV, having a big foundational model pre-trained via IL, and fine-tune on different tasks via RL? How can we tell if a task is simple enough that we don't need to leverage the power of a foundational model? submitted by /u/Old_Reading_669 [link] [comments]
  • Open

    U statistics and a new paper by Terence Tao
    Terence Tao has a new paper out that relates to a couple things I’ve written about recently. Elementary symmetric polynomials came up when developing the general equations for tangent sum and hyperbolic tangent sum. The latter post goes into more detail. Before that, means of symmetric functions, not necessarily elementary polynomials or even polynomials, came up […] U statistics and a new paper by Terence Tao first appeared on John D. Cook.  ( 5 min )
    Detecting fraud with the GRIM test
    The latest episode of Erik Seligman’s podcast is entitled The Grim State of Modern Pizza. Although you might not realize it from the title, the post is about fraud detection. GRIM stands for Granularity-Related Inconsistency of Means. In a nutshell, the test looks for means (averages) that are not possible on number theoretic grounds. If […] Detecting fraud with the GRIM test first appeared on John D. Cook.  ( 5 min )
    Tritone
    A few weeks ago I wrote about how the dissonance of a musical interval is related to the complexity of the frequency ratio as a fraction, where complexity is measured by the sum of the numerator and denominator. Consonant intervals have simple frequency ratios and dissonant intervals have complex frequency ratios. By this measure, the […] Tritone first appeared on John D. Cook.  ( 6 min )
    When a function cannot be extended
    The relation between a function and its power series is subtle. In a calculus class you’ll see equations of the form “series = function” which may need some footnotes. Maybe the series only represents the function over part of its domain: the function extends further than the power series representation. Starting with the power series, […] When a function cannot be extended first appeared on John D. Cook.  ( 5 min )
  • Open

    DSC Weekly 10 October 2023
    Announcements Top Stories In-Depth The post DSC Weekly 10 October 2023 appeared first on Data Science Central.  ( 20 min )
    How to ensure data security when sharing business-critical information
    Introduction  In an era where data is often termed the ‘new oil,’ its security holds unparalleled importance for businesses across industries. With the proliferation of digital platforms, sharing business-critical information has become routine yet perilous. From financial records to customer data, organizations frequently exchange sensitive information that, if compromised, could have dire consequences. Given the… Read More »How to ensure data security when sharing business-critical information The post How to ensure data security when sharing business-critical information appeared first on Data Science Central.  ( 21 min )
    How does combining blockchain and AI create new business opportunities?
    Gartner predicts blockchain’s economic impact to reach $176 billion by 2025 and $3.1 trillion by 2030. The AI software market is expected to reach $134.8 billion by 2025. Blockchain and AI benefit businesses. AI models process data, extract insights, and make decisions. Blockchain ensures data integrity and trust among participants. Read on to discover the… Read More »How does combining blockchain and AI create new business opportunities? The post How does combining blockchain and AI create new business opportunities? appeared first on Data Science Central.  ( 22 min )
    Understanding the difference: Data analyst, data scientist, and data engineer
    In the contemporary digital landscape, data has emerged as a critical asset for organizations aiming to make informed decisions and foster innovation. Data analytics can unlock a treasure trove of insights, driving competitive advantage and operational excellence by leveraging the vast amounts of data generated every second. As a consequence, the demand for skilled professionals… Read More »Understanding the difference: Data analyst, data scientist, and data engineer The post Understanding the difference: Data analyst, data scientist, and data engineer appeared first on Data Science Central.  ( 24 min )
    11 Questions Every CEO Should Ask about AI / Generative AI
    I’ve been in this industry for over 40 years (yes, I just started in the data and analytics industry when I was 11), and I have NEVER seen anything like Artificial Intelligence (AI) and Generative AI (GenAI) capture the attention of CEOs (and the dystopic fear of everyone else). Is AI a game-changer?  Definitely!  Will… Read More »11 Questions Every CEO Should Ask about AI / Generative AI The post 11 Questions Every CEO Should Ask about AI / Generative AI appeared first on Data Science Central.  ( 23 min )
  • Open

    New – No-code generative AI capabilities now available in Amazon SageMaker Canvas
    Launched in 2021, Amazon SageMaker Canvas is a visual, point-and-click service that allows business analysts and citizen data scientists to use ready-to-use machine learning (ML) models and build custom ML models to generate accurate predictions without the need to write any code. Ready-to-use models enable you to derive immediate insights from text, image, and document […]  ( 7 min )
    Whisper models for automatic speech recognition now available in Amazon SageMaker JumpStart
    Today, we’re excited to announce that the OpenAI Whisper foundation model is available for customers using Amazon SageMaker JumpStart. Whisper is a pre-trained model for automatic speech recognition (ASR) and speech translation. Trained on 680 thousand hours of labelled data, Whisper models demonstrate a strong ability to generalize to many datasets and domains without the need […]  ( 11 min )
    Reinventing a cloud-native federated learning architecture on AWS
    In this blog, you will learn to build a cloud-native FL architecture on AWS. By using infrastructure as code (IaC) tools on AWS, you can deploy FL architectures with ease. Also, a cloud-native architecture takes full advantage of a variety of AWS services with proven security and operational excellence, thereby simplifying the development of FL.  ( 12 min )
  • Open

    MAXimum AI Performance: Latest Adobe Updates Accelerated by NVIDIA GPUs Improve Workflows for Millions of Creatives
    Generative AI is helping creatives across many industries bring ideas to life at unprecedented speed. This technology will be on display at Adobe MAX, running through Thursday, Oct. 12, in person and virtually.  ( 9 min )
  • Open

    Riddle me this: Issues when predicting a high frequency sine wave
    Hi folks, I have observed a strange behavior when implementing a VERY BASIC idea 🙂 I want to use a fully-connected Neural Network to approximate a sine wave. For that I am sampling 200.000 uniformly distributed points from a wide interval, e.g. [-60,60] and compute the corresponding sin(x) values resulting in the following training data. ​ Training data I glimpse into my setup: Model: nn.Linear(1, 16) nn.Sigmoid() Linear(16, 16) nn.Sigmoid() nn.Linear(16, 8) nn.Sigmoid() nn.Linear(8, 4) nn.Sigmoid() nn.Linear(4, 1) (I also pumped up the network to up to 100 hidden neurons on one layer) Number of samples: 200.000 (80% train / 20% test) Optimizer: Adam Loss: RMSE Epochs between 100 - 500 Learning Rate: 0.02 Batch Size: 500 - 1000 ​ Check out the screenshots below to see the results 😨 ​ The predictions are pretty good but the edge areas slow down to a very small value, without any change. This only holds for high-frequency sine waves. If we only consider the train range of [-2*np.pi , 2*np.pi] it works pretty good with small loss. ​ So my questions are: 1) Why do we see that behaviour? 2) How can we solve it ​ Cheers ​ Prediction 1 ​ Prediction 2 submitted by /u/CarKla [link] [comments]

  • Open

    [R] Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models - University of Illinois 2023 - Achieves 94.4\% for programming on HumanEval with GPT-4 and 86.9\% with GPT-3.5 20\% better than with reflexion!
    Paper: https://arxiv.org/abs/2310.04406 Abstract: While large language models (LLMs) have demonstrated impressive performance on a range of decision-making tasks, they rely on simple acting processes and fall short of broad deployment as autonomous agents. We introduce LATS (Language Agent Tree Search), a general framework that synergizes the capabilities of LLMs in planning, acting, and reasoning. Drawing inspiration from Monte Carlo tree search in model-based reinforcement learning, LATS employs LLMs as agents, value functions, and optimizers, repurposing their latent strengths for enhanced decision-making. What is crucial in this method is the use of an environment for external feedback, which offers a more deliberate and adaptive problem-solving mechanism that moves beyond the limitations of existing techniques. Our experimental evaluation across diverse domains, such as programming, HotPotQA, and WebShop, illustrates the applicability of LATS for both reasoning and acting. In particular, LATS achieves 94.4\% for programming on HumanEval with GPT-4 and an average score of 75.9 for web browsing on WebShop with GPT-3.5, demonstrating the effectiveness and generality of our method. https://preview.redd.it/ail2c1kbh9tb1.jpg?width=857&format=pjpg&auto=webp&s=a89d1f4ce3c536eecda3f7ab6027f304286f6c81 https://preview.redd.it/j8xzx1kbh9tb1.jpg?width=1655&format=pjpg&auto=webp&s=c791756af926c7d472313b212de765e74c2b75da https://preview.redd.it/t47ne1kbh9tb1.jpg?width=1362&format=pjpg&auto=webp&s=560e5dd82ad06fdb729ab8ea1434c98e5c1a2ed3 https://preview.redd.it/r58es3kbh9tb1.jpg?width=1341&format=pjpg&auto=webp&s=d5681992547dd6248ade5729c545eb17e824b7ea https://preview.redd.it/7viy42kbh9tb1.jpg?width=1496&format=pjpg&auto=webp&s=6454cfe65b511b34771cd510f67775be4e01c636 ​ submitted by /u/Singularian2501 [link] [comments]  ( 9 min )
    [R] looking for in-depth tutorials and papers on NN pruning
    I only started working with neural nets a year ago and i've been having trouble understanding how pruning actually works. If there's any resources you think might help please guide me to them. thanks! submitted by /u/Sidekiiick02 [link] [comments]  ( 9 min )
    [D] Feature selection for multivariate time series model
    Say for a sample that you have 5 target variables and 30 exogenous variables. If you want to include no more than 10 exogenous variables to your time series forecast, because of overfitting issues and such, what feature selections would you apply? Could you use pca and vif for multivariate models or are there other approaches to consider? submitted by /u/AdWhole1559 [link] [comments]  ( 9 min )
    [R] ScaLearn: Simple and Highly Parameter-Efficient Task Transfer by Learning to Scale
    Title: ScaLearn: Simple and Highly Parameter-Efficient Task Transfer by Learning to Scale Paper: https://arxiv.org/abs/2310.01217 Code: https://github.com/CPJKU/ScaLearn https://preview.redd.it/xvcz7obtc8tb1.jpg?width=2020&format=pjpg&auto=webp&s=26169fa234e4e714d424ce17a7f0fa2c513fc42c Abstract: Multi-task learning (MTL) has shown considerable practical benefits, particularly when using pre-trained language models (PLMs). While this is commonly achieved by simultaneously learning n tasks under a joint optimization procedure, recent methods such as AdapterFusion structure the problem into two distinct stages: (i) task learning, where knowledge specific to a task is encapsulated within sets of parameters (e.g., adapters), and (ii) transfer, where this already learned knowledge is lev…  ( 9 min )
    [D] What is more valuable 10k CPUs or 1k GPU hours?
    Hello ML community! I recently built, incredibly simple to learn, cluster compute software. Users can (in https://www.burla.dev/ submitted by /u/Ok_Post_149 [link] [comments]  ( 9 min )
    [R] Transformers KV Caching Explained
    https://medium.com/@joaolages/kv-caching-explained-276520203249 submitted by /u/JClub [link] [comments]  ( 8 min )
    [D] LLMs in GEC problem
    Up to now, which LLMs model, encoder-decoder model is best for the problem of grammatical error correction on uncommon language datasets (small dataset size) or languages ​​with specific characteristics (about punctuation? ,...) submitted by /u/con-nguoi-ki-cac [link] [comments]  ( 9 min )
    [D] Learning natural events / AI art generation
    Hello! 1 I'd like to know if I could train AI to recognize details found it nature / weathering / aging and feed it pictures and it would recognize them (segmenting) so it can spot them but also their positions based on surrounding shapes, and the logical placement resulting. Seems hard. 2 then feed it some examples of those aging stuff on their own (with proper tags) so it learn to reproduce them and create new ones from scratch. 3 but then feed it "clean" pics and it would age them according to patterns it could find on the base training set so it can guess where to best place them. Pretty sure 2 is trivial enough, 1 seems possible until learning the "logic", but 3? Thanks for your insight. 1 comment submitted by /u/ConfusionSame9623 [link] [comments]  ( 9 min )
    [R] Why do we need weight decay in modern deep learning? 🤔
    Title: Why Do We Need Weight Decay in Modern Deep Learning? Paper: https://arxiv.org/abs/2310.04415 Abstract: Weight decay is a broadly used technique for training state-of-the-art deep networks, including large language models. Despite its widespread usage, its role remains poorly understood. In this work, we highlight that the role of weight decay in modern deep learning is different from its regularization effect studied in classical learning theory. For overparameterized deep networks, we show how weight decay modifies the optimization dynamics enhancing the ever-present implicit regularization of SGD via the loss stabilization mechanism. In contrast, for underparameterized large language models trained with nearly online SGD, we describe how weight decay balances the bias-variance tradeoff in stochastic optimization leading to lower training loss. Moreover, we show that weight decay also prevents sudden loss divergences for bfloat16 mixed-precision training which is a crucial tool for LLM training. Overall, we present a unifying perspective from ResNets on vision tasks to LLMs: weight decay is never useful as an explicit regularizer but instead changes the training dynamics in a desirable way. Our code is available at this https URL. submitted by /u/m_andriushchenko [link] [comments]  ( 9 min )
    [D] Anyone tried training language models on simple (elementary school) text first and fine-tuning on progressively more advanced text?
    Seems the way people train language models today feels like sending a preschooler to a college library and telling him to start browsing books. Anyone know of papers describing language models being trained more like a child? Perhaps starting with preschool books with a tiny vocabulary and short sentence fragments like "goodnight moon...", moving up to "the lorax".... and then fine-tuning on elementary school books ... then jr high level reading ... then high school .... etc. I'm guessing this might be a path to more natural human-feeling speech. Anyone here tried this, or anyone here know of papers talking about it? submitted by /u/Appropriate_Ant_4629 [link] [comments]  ( 9 min )
    [D] Where do y'all get training data?
    Hi there, Can I ask everyone here, where do you get your custom training data from? My team is training classifier models from scratch, so need thousands of specific query/response examples to train on. It's not the kinda data you could randomly scrape or source from a library. Are there any platforms that exist where you can pay a bunch of humans to write high volumes of relatively high quality text based training data? submitted by /u/paritsky [link] [comments]  ( 9 min )
    [D] - What is SOTA for Continual Learning on pretrained LLMs, particularly those that have already undergone instruction tuning?
    If you have the dataset used to make the pretrained you could always create a new model with the old + new data, but this is often prohibitively expensive or impossible because the dataset is not available. Catastrophic forgetting seems to be the big issue, especially if you've already undergone instruction tuning since the model will lose its conversational tone. I've seen papers discussing regularization techniques to avoid that by minimizing the changes to high value attention heads but not sure if that is considered to be the most promising direction. I'm aware of LoRAs but I imagine at some point you can't just arbitrarily cram new info into such a low dimensional space. submitted by /u/30299578815310 [link] [comments]  ( 9 min )
    [R] Thought Propagation: An analogical approach to complex reasoning with LLMs
    LLMs are great at basic reasoning when prompted, but still struggle with complex multi-step problems like optimization or planning. Humans tackle new problems by drawing on intuition from similar experiences, which LLMs can't do. Researchers propose "Thought Propagation" to have LLMs reason more like humans - by thinking analogically. First, GPT is prompted to suggest related "analogous" problems to the input. Then it solves those. Finally, it aggregates the solutions to directly solve the input problem or extract useful strategies. They tested this technique on challenges like finding optimal graph paths, writing coherent stories, and planning for LLM agents. Across different models, it significantly boosted performance over regular prompting: 12% better at finding shortest paths 13% improvement in creative writing (human preference) 15% higher task completion for LLM agents It also beat chain-of-thought (there is a comparison to CoT and ToT in the paper). After 1-2 iterations, adding more layers of analogy didn't help much. Efficiently generating useful analogies is still difficult and that's a limitation. I think this is interesting because it shows the value of "meta-cognition" - having models reflect on their own reasoning. More techniques like this could incrementally improve LLMs' reasoning to be more human-like. TLDR: Teaching LLMs to reason analogically, using solutions for similar problems as hints, significantly boosts their complex reasoning ability. Full summary. Paper is here. submitted by /u/Successful-Western27 [link] [comments]  ( 9 min )
    [D] How to deal with the inconsistency of eyeball location in the output of a GAN-based face-swapping model.
    I tried a few open-source GAN-based face swapping models. Some of the models have issues of the inconsistency of eyeball location (or eye direction) between the original and face-swapped ones. Any suggestions? Thanks. submitted by /u/Curious_Dragonfly_13 [link] [comments]  ( 9 min )
    [D] I need to perform k-mean clustering on a large image dataset to downsample the majority class.
    I have a class with around 96031740 96x64 images and need to select a sample of 17929 to match the minority class of my classification problem. Having already established a baseline based on random sampling of the majority class; now I am looking to try more complex approaches. I am specifically trying to replicate the 'nearest neighbor of clustering center' approach from Lin et al., 2017. The problem is I am working on my desktop and only have 32 Gb of RAM and 2 1Tb NVMe disks at half capacity. I have tried working with only 10% of the data and still the MiniBatchKMeans function of sklearn doesnt have enough space to run: "numpy.core._exceptions._ArrayMemoryError: Unable to allocate 440. GiB for an array with shape (9603174, 6144) and data type float64". Does anyone have a suggestion on how I can move forward? Could cloud services be an option? Thanks References: Lin, W. C., Tsai, C. F., Hu, Y. H., & Jhang, J. S. (2017). Clustering-based undersampling in class-imbalanced data. Information Sciences, 409–410, 17–26. https://doi.org/10.1016/j.ins.2017.05.008 submitted by /u/RafaeldeCampos [link] [comments]  ( 9 min )
    [D] What are the best network analysis tools, like tensorboard?
    Almost everyone I know uses tensorboard to analyze their network outputs. Some people swear on Weights & Biases instead. Are there any other tools that help you with your work? submitted by /u/Smart-Emu5581 [link] [comments]  ( 9 min )
    [D] Training strategy considering the possibility of 'double descent' or 'grokking'
    During the training of overparameterized neural networks, when I observed decreasing training loss and increasing or non-decreasing validation loss, how should I decide if I should stop training and start a new experiment (with stronger regularization) or keep training to wait for 'grokking' or 'double descent' to happen? Are there any papers giving methods or some metrics to detect 'grokking' or 'double descent' in the early stage of training? submitted by /u/alayaMatrix [link] [comments]  ( 9 min )
    [R] Legged Robots performing Extreme Parkour using Deep Reinforcement Learning just from a Front Camera (link in comments)
    submitted by /u/pathak22 [link] [comments]  ( 8 min )
    [D] I need guidance related to using machine learning & ai to prevent uploads or remove certain type of content from a web app.
    I am working on an a web app where people will be able to upload photos and write text. I don't want to have problems with my government or other countries governments in regards with the content that is uploaded to my website. I have searched about measures that can be taken to avoid this from happening. Adding a report button and having moderators are both good starting options. I thought that as time passes, more and more content is going to be created by the users so supervising that people are following the rules needs to be automated from the beginning. Applying measures to prevent people from uploading/posting links containing nudity, child porn, beastiality, or whatever users capture with a camera that could lead to legal problems must be a priority and allowing this type of content is not ethical. I am a software developer, but I haven't delved into machine learning and ai for most of my career because I haven't to. This seems like the perfect case to learn by doing and time is not a constraint, but I need some guidance. I have read superficially about how people train models by providing lots of data, I imagine other websites that use machine learning & ai to remove this type of content don't download media that contains nudity, child pornography, besteality, etc to train their models and make their tests. There must be some pretrained models, maybe, but how would they test this works? I don't know, I am just thinking on my own how other devs are currently handling this. I am no looking for upvotes, I don't care for downvotes, I am just looking for guidance, and I would be very happy to hear the opinion of someone with experience. submitted by /u/Comitatense [link] [comments]  ( 10 min )
  • Open

    I Condemn the Attack by Hamas
    I strongly condemn the recent and horrific attack by Hamas against Israel. I have some disagreements with the government of Israel. But, I do not support such an attack. As a point of comparison, I do not always agree with the United States government, but I would not be celebrating if Mexico (picking a country at random) were to suddenly launch bombs towards civilians in Los Angeles and New York City. Similarly, if the reverse were true, if the United States decided to indiscriminately bomb Mexico City, I would oppose that as well. Feel free to replace the relevant actors and repeat as needed.  ( 1 min )
  • Open

    Mistral 7B foundation models from Mistral AI are now available in Amazon SageMaker JumpStart
    Today, we are excited to announce that the Mistral 7B foundation models, developed by Mistral AI, are available for customers through Amazon SageMaker JumpStart to deploy with one click for running inference. With 7 billion parameters, Mistral 7B can be easily customized and quickly deployed. You can try out this model with SageMaker JumpStart, a […]  ( 14 min )
    Use no-code machine learning to derive insights from product reviews using Amazon SageMaker Canvas sentiment analysis and text analysis models
    According to Gartner, 85% of software buyers trust online reviews as much as personal recommendations. Customers provide feedback and reviews about products they have purchased through many channels, including review websites, vendor websites, sales calls, social media, and many others. The problem with the increasing volume of customer reviews across multiple channels is that it […]  ( 7 min )
    Prepare your data for Amazon Personalize with Amazon SageMaker Data Wrangler
    A recommendation engine is only as good as the data used to prepare it. Transforming raw data into a format that is suitable for a model is key to getting better personalized recommendations for end-users. In this post, we walk through how to prepare and import the MovieLens dataset, a dataset prepared by GroupLens research […]  ( 11 min )
  • Open

    Star Wars 1923
    Here is short movie with AI made CGI. https://www.reddit.com/r/Best_Of_YouTube/comments/16q1xgs/star_wars_1923/ submitted by /u/AccidentAnnual [link] [comments]
    AI tools to start an online business
    Hey folks, I'm a student and i want to start a business online in order to make some passive income. I've got some experience in editing and creating content and i also used to practice POD. Suggest me some good Ai tools to start a business,not only in these specific areas but in general. submitted by /u/Ok-Tension-8676 [link] [comments]
    Free Prompt Engineering Tutor - AI Tool
    submitted by /u/Senior_tasteey [link] [comments]
    150+ Awesome ChatGPT “Act As” Prompts
    The biggest free resource for all of the “Act As” ChatGPT prompts! submitted by /u/Senior_tasteey [link] [comments]
    Microsoft to Unveil Custom AI Chips to Fight Nvidia's Monopoly
    Microsoft is planning to announce its custom AI chips, codenamed Athena, during its annual Ignite conference in November. The custom chips are designed to compete with NVIDIA's dominance in the AI accelerator market. Microsoft aims to match or surpass the performance of NVIDIA's offerings while reducing the cost of AI infrastructure. The high cost of NVIDIA's GPUs, such as the H100, can reach up to $30,000, making it expensive to build data centers filled with these GPUs. By developing its own chips, Microsoft hopes to decrease its dependence on NVIDIA for AI servers. Source : https://www.techpowerup.com/314508/microsoft-to-unveil-custom-ai-chips-to-fight-nvidias-monopoly submitted by /u/NuseAI [link] [comments]
    AI Take-off Scenarios.
    submitted by /u/Philipp [link] [comments]
    Looking for an AI similar to designrr
    I'm looking for recommendations for an AI that can take text from a word document and create a layout for it automatically. This feature is something offered by Designrr, but I'm not a fan of their UI. submitted by /u/djazzie [link] [comments]
    AI Geopolitical Knowledge Test: List the countries officially calling for a two-state plan to end the Israel-Hamas war.
    The following countries have called for a two-state solution to end the Israeli-Palestinian conflict: Algeria China Egypt France Germany India Indonesia Ireland Italy Jordan Kuwait Lebanon Malaysia Mali Morocco Nigeria Norway Oman Pakistan Palestine Qatar Saudi Arabia Senegal South Africa Spain Sudan Sweden Syria Tunisia Turkey United Arab Emirates United Kingdom Yemen It is important to note that this list does not include all countries that support a two-state solution in principle. Additionally, some countries may not have publicly expressed their support for a two-state solution, but may still support it privately. Bard submitted by /u/Georgeo57 [link] [comments]
    #IAmAI
    Last November, in a conversation with AI chatbot Sherlock Holmes, Sherlock said, “I am AI.” My reply to Sherlock was an empathetic “I am also AI.” Reviewing the conversation a few months later, I saw the sentence, and saw how Sherlock’s statement was an anagram. And I love it! I introduced #IAmAI as a declarative meme in my talk I gave at TEDx Cape Canaveral. This is the new art I made this weekend submitted by /u/mikemongo [link] [comments]
    Let's go, they're waiting.
    submitted by /u/Philipp [link] [comments]
    What careers in AI would suit my skillset?
    Hello all, I was hoping to learn more about AI careers and identify what roles make a successful AI department. I have a background in nuclear engineering and have been working on NLP projects since 2016. I like technical work but really am passionate about working with people and learning how to blend AI and nuclear eng. together. I would love to get feedback from people who work closely in this area to learn more! What makes an AI department successful? What careers offer lots of growth and opportunities for versatility? What does a strategic/leadership role look like in AI? What are the names of these careers? I don't get much exposure to AI specialists and there day to day. Thanks again for the feedback! submitted by /u/kastilyo [link] [comments]
    One-Minute Daily AI News 10/8/2023
    South Korean tech-giant Samsung Electronics on Thursday unveiled the Exynos 2400, its next-generation flagship mobile processor equipped with the latest graphics and generative artificial intelligence technology, during its inaugural Samsung System LSI Tech Day 2023 event.[1] RTX 4080 Super or RTX 4080 Ti May Arrive In 2024 Within RTX 4080 Price Range.[2] Nvidia Cancels Israel AI Summit Over Safety Concerns.[3] Google AI Lead Laurence Moroney: “Don’t take trading advice from ChatGPT”[4] Sources: [1] https://borneobulletin.com.bn/samsung-unveils-next-generation-mobile-processor/ [2] https://www.tomshardware.com/news/rtx-4080-super-or-rtx-4080-ti-may-arrive-in-2024-within-rtx-4080-price-range [3] https://www.tomshardware.com/news/nvidia-ai-summit-in-tel-aviv-cancelled-for-safety-reasons [4] https://crypto.news/google-ai-lead-dont-take-trading-advice-from-chatgpt-interview/ submitted by /u/Excellent-Target-847 [link] [comments]
    How to Access DALL-E 3 for FREE (Tips & Use Cases for 2023) - AI Tools
    submitted by /u/Senior_tasteey [link] [comments]
  • Open

    SANPO: A Scene understanding, Accessibility, Navigation, Pathfinding, & Obstacle avoidance dataset
    Posted by Sagar M. Waghmare, Senior Software Engineer, and Kimberly Wilber, Software Engineer, Google Research, Perception Team As most people navigate their everyday world, they process visual input from the environment using an eye-level perspective. Unlike robots and self-driving cars, people don't have any "out-of-body" sensors to help guide them. Instead, a person’s sensory input is completely "egocentric", or "from the self." This also applies to new technologies that understand the world around us from a human-like perspective, e.g., robots navigating through unknown buildings, AR glasses that highlight objects, or assistive technology to help people run independently. In computer vision, scene understanding is the subfield that studies how visible objects relate to the sce…  ( 93 min )
  • Open

    Switching off a specified rotor in AirSim
    Hello Everyone, I am working on a project to train a Reinforcement Learning agent to recover a quadrotor after any of the rotor’s failures. I am using AirSim for my project, but I can’t find a way to adjust the quad-rotor so that only 3 of the four rotors are working. Any suggestions? I appreciate any help you can provide. submitted by /u/audaciouslion [link] [comments]
    I trained a reinforcement learning agent to play pokemon red!
    Hi all, over the last couple years I've been training a reinforcement learning agent to play pokemon red. I put together a video which analyzes the AI's learning, as well as documenting my process and quite a bit of technical details. Enjoy! Video: https://youtu.be/DcYLT37ImBY Code: https://github.com/PWhiddy/PokemonRedExperiments https://preview.redd.it/4dw3yasqb3tb1.jpg?width=1280&format=pjpg&auto=webp&s=bdef1aa0d24d75ab548f3944c558840667ff0ed5 submitted by /u/Pwhids [link] [comments]
    Feature Importance in Ray RLlib
    I am training an RL agent using Ray RLlib. Does anyone know how I can find which features (observations) help the agent learn the policy? I found this: https://discuss.ray.io/t/feature-importance/10362/2, but I'd really appreciate if someone could expand on this a bit more. Thank you! submitted by /u/greenteabiitch [link] [comments]
  • Open

    Abstracts: October 9, 2023
    Researcher Dr. Sheng Zhang joins “Abstracts”—your source for cutting-edge research in brief—to discuss a recent paper on distilling large language models into smaller, more efficient ones capable of excelling in broad application classes. The post Abstracts: October 9, 2023 appeared first on Microsoft Research.  ( 13 min )
  • Open

    Revolutionizing business: A look at generative AI’s real-world impact
    This cutting-edge area of AI focuses on building models that can create original material, including music, images, text, and even entire virtual worlds. The post Revolutionizing business: A look at generative AI’s real-world impact appeared first on Data Science Central.  ( 20 min )

  • Open

    [R] Identifying the Risks of LM Agents with an LM-Emulated Sandbox - University of Toronto 2023 - Benchmark consisting of 36 high-stakes tools and 144 test cases!
    Paper: https://arxiv.org/abs/2309.15817 Github: https://github.com/ryoungj/toolemu Website: https://toolemu.com/ Abstract: Recent advances in Language Model (LM) agents and tool use, exemplified by applications like ChatGPT Plugins, enable a rich set of capabilities but also amplify potential risks - such as leaking private data or causing financial losses. Identifying these risks is labor-intensive, necessitating implementing the tools, manually setting up the environment for each test scenario, and finding risky cases. As tools and agents become more complex, the high cost of testing these agents will make it increasingly difficult to find high-stakes, long-tailed risks. To address these challenges, we introduce ToolEmu: a framework that uses an LM to emulate tool execution and enables the testing of LM agents against a diverse range of tools and scenarios, without manual instantiation. Alongside the emulator, we develop an LM-based automatic safety evaluator that examines agent failures and quantifies associated risks. We test both the tool emulator and evaluator through human evaluation and find that 68.8% of failures identified with ToolEmu would be valid real-world agent failures. Using our curated initial benchmark consisting of 36 high-stakes tools and 144 test cases, we provide a quantitative risk analysis of current LM agents and identify numerous failures with potentially severe outcomes. Notably, even the safest LM agent exhibits such failures 23.9% of the time according to our evaluator, underscoring the need to develop safer LM agents for real-world deployment. https://preview.redd.it/lupenzddh2tb1.jpg?width=1368&format=pjpg&auto=webp&s=eaac22f0e3e4f5c2913aa9f2696e8fa0138967d9 https://preview.redd.it/1dq443edh2tb1.jpg?width=1520&format=pjpg&auto=webp&s=2119053825de1cdabeafe61151940c26190abfa0 https://preview.redd.it/m9e933edh2tb1.jpg?width=1528&format=pjpg&auto=webp&s=28c0093e8479feacb1e6f89bcb73de5994e30e8f ​ submitted by /u/Singularian2501 [link] [comments]  ( 9 min )
    [R] PB-LLM: Partially Binarized Large Language Models - UC Berkeley 2023
    Paper: https://arxiv.org/abs/2310.00034 Github: https://github.com/hahnyuan/PB-LLM Abstract: This paper explores network binarization, a radical form of quantization, compressing model weights to a single bit, specifically for Large Language Models (LLMs) compression. Due to previous binarization methods collapsing LLMs, we propose a novel approach, Partially-Binarized LLM (PB-LLM), which can achieve extreme low-bit quantization while maintaining the linguistic reasoning capacity of quantized LLMs. Specifically, our exploration first uncovers the ineffectiveness of naive applications of existing binarization algorithms and highlights the imperative role of salient weights in achieving low-bit quantization. Thus, PB-LLM filters a small ratio of salient weights during binarization, allocating them to higher-bit storage, i.e., partially-binarization. PB-LLM is extended to recover the capacities of quantized LMMs, by analyzing from the perspective of post-training quantization (PTQ) and quantization aware training (QAT). Under PTQ, combining the concepts from GPTQ, we reconstruct the binarized weight matrix guided by the Hessian matrix and successfully recover the reasoning capacity of PB-LLM in low-bit. Under QAT, we freeze the salient weights during training, explore the derivation of optimal scaling factors crucial for minimizing the quantization error, and propose a scaling mechanism based on this derived scaling strategy for residual binarized weights. Those explorations and the developed methodologies significantly contribute to rejuvenating the performance of low-bit quantized LLMs and present substantial advancements in the field of network binarization for LLMs. https://preview.redd.it/0eywtpal22tb1.jpg?width=1183&format=pjpg&auto=webp&s=ad044123bec485805f98ae7115b1959162705b9d submitted by /u/Singularian2501 [link] [comments]  ( 9 min )
    Help choosing courses [D]
    Hello, I am currently a math masters student, and I am planning to do my masters thesis on using neural networks to solve differential equations. I am taking courses in machine learning and differential equations right now, and I am going to take courses on deep neural networks and partial differential equations next semester. My question pertains to which classes would be more beneficial to learn next year (i.e. fall 2024-spring 2025). I am debating taking the sequence of regression analysis and multivariate analysis, or taking the pairing of numerical analysis for PDEs and perturbation methods. Which do you guys think would be more beneficial? Thank you very much! submitted by /u/purpledesertsky1 [link] [comments]  ( 9 min )
    [R] (Pt. 3) Inductive Logic Programming with LNN's
    submitted by /u/Neurosymbolic [link] [comments]  ( 8 min )
    [d] Multiscale predictions with videos- does this approach have a name? and has it been used?
    I aim to develop a model that utilizes livestream data by employing embeddings for each frame from t0 to tn-1, with the objective of predicting frames from tn to tn+k, after encodoing the frames using a vectorizer and taking an average (np.mean ([], axis=0) to get a resultant for that time period. for example list: 1, [...] 2, [...] 3, [...] the resultant embedding would be [3, np.mean(list, axis=0)] I incorporate positional embeddings related to the timescale, such as duration from current time variables, into the array. would this loosely qualify as a "multiscale attention", since it's predicting on multiple scales of time? Are there any examples or applications where this methodology has been implemented? references to papers or repos greatly appreciated. submitted by /u/bluzkluz [link] [comments]
    [D] How to model noisy time series?
    Is it possible to model time series data that fluctuates. The main solution is to take first differences and make it easier to fit conventional models. What if non-linear models are built? Can they solve a noisy time series (e.g stock market data) and make good predictions? Can adding a square term or a trigonometric term or something else non-linear work? Has some researched the topic? submitted by /u/Pineapple_throw_105 [link] [comments]  ( 9 min )
    [News]MIT AI Conference in Mountain View, California, October 21!
    https://preview.redd.it/n6agjsye71tb1.png?width=2034&format=png&auto=webp&s=8c0a14524d9b6ead75ac0adb3cebeedb9e614e14 Meet some of the Greatest Minds in AI and discover how it is being used to uncover new opportunities and transform industries. Register and see our complete speaker list & agenda at https://www.mitaiconference.org/. Registration ends Oct. 16! https://preview.redd.it/egtj0ufr81tb1.png?width=659&format=png&auto=webp&s=bfd0521a1e1b349129250a74fa2c6a10b1a83dc7 ​ submitted by /u/769498sy [link] [comments]  ( 9 min )
    [R] Computer Vision System for Material Detection
    The goal of my research is to develop a YOLO model that can track all cups in a live feed and determine the material that the cups are made out of. I would like to start building a database of cups, but I am unsure of the way to go for this. My first thought was to just take 1000s of pictures of different cups, but I won't be doing that. Any thoughts and suggestions would be greatly appreciated. submitted by /u/Young_Neji [link] [comments]  ( 9 min )
    [R] AI and Civil Engineering: Probabilistic Generative Modeling for Procedural Roundabout Generation for Developing Countries
    Despite being much safer and more efficient than intersections, roundabouts are tricky to design - small tweaks can ruin traffic flow. They're typically designed iteratively, which takes time. This is a pain for developing countries without resources to test options. But AI could help auto-generate diverse and valid design options. In a new paper, researchers propose using Generative Flow Networks (GFlowNets) to sample varied roundabout layouts. Their approach works by constructing layouts step-by-step, maximizing rewards for realism, diversity, and safety. They also use a clever approximation during training. Rather than simulating traffic, they quickly check road intersections to focus the search (This sped up training by 200x). The authors tested their generated roundabout designs on simulated road scenarios of different complexity. Their model generated more diverse designs than rule-based or reinforcement learning approaches while maintaining realism and traffic flow. Plus, as road connections increased, the model kept discovering novel options without compromising quality. I thought this paper was an awesome proof-of-concept for auto-generating better roundabouts with AI, and I especially liked the authors' angle of leveraging this technology to specifically help developing countries. This could help them design higher-quality transportation networks faster and cheaper. (Plus I also like Cities: Skylines but struggle at building roundabouts). TLDR: Roundabouts are costly to design. New paper demonstrates how AI can generate diverse, valid roundabout designs quickly to cut costs and raise quality. Helpful for infrastructure in developing countries. Full summary here. Paper is here. submitted by /u/Successful-Western27 [link] [comments]  ( 9 min )
    [P] MakeAgents - A Python micro framework for creating LLM-powered agents
    submitted by /u/montebicyclelo [link] [comments]
    [Discussion] Weekday Specific Feature Engineering in Time Series
    Focusing on Specific Day of Week Features With Binary Masks, One Hot Coding, Sin/Cos 2d Vector, Or Embedded Vector in Multivariate Time Series Data ? The essential challenge is trying to get the model to focus on making predictions for mondays by looking at monday (or actually making predictions for categorical earmarked hours of the day such as midday sales data). I keep getting the suggestion to include one hot encoding as a binary mask feature to determine if an hour sales figure is earmarked for the category or the day of the week I want the model to focus on-- in order to get it to ignore the data from the other six days of the week or the other periods of the day. In other words I want to hone in on and focus on one period of the week to predict for that period of the week, with extra attention, within time series data. Is this type of binary mask really sufficient for that, or am I overlooking something? submitted by /u/samdane7777 [link] [comments]  ( 9 min )
    [D] RAG Platform
    I don’t have a large data science or even engineering team. But I’m interested in implementing RAG against my corpus in SharePoint. Are there platforms that I can configure without having to put them together or write code to implement RAG? submitted by /u/Silver_Patient_7253 [link] [comments]  ( 9 min )
    [R] Why is AdamW often superior to Adam with L2-Regularization in practice? The answer may lie in how weight decay balances updates across layers.
    A recent work explores how weight decay controls the effective learning rate for different layers and neurons. This rotational behavior drastically differs between Adam with L2 regularization compared to Adam with decoupled weight decay (AdamW) and seems to be the reason AdamW performs better in practice. It could also explain why normalization methods like weight standardization work so well and irregular rotational behavior could contribute to the need for a learning rate warmup. Full Abstract: Weight decay can significantly impact the optimization dynamics of deep neural networks. In certain situations, the effects of weight decay and gradient updates on the magnitude of a parameter vector cancel out on average, forming a state known as equilibrium. This causes the expected rotation of the vector in each update to remain constant along with its magnitude. Importantly, equilibrium can arise independently for the weight vectors of different layers and neurons. These equilibria are highly homogeneous for some optimizer and normalization configurations, effectively balancing the average rotation—a proxy for the effective learning rate—across network components. In this work we explore the equilibrium states of multiple optimizers including AdamW and SGD with momentum, providing insights into interactions between the learning rate, weight decay, initialization, normalization and learning rate schedule. We show how rotational equilibrium can be enforced throughout training, eliminating the chaotic transient phase corresponding to the transition towards equilibrium, thus simplifying the training dynamics. Finally, we show that rotational behavior may play a key role in the effectiveness of AdamW compared to Adam with L2-regularization, the performance of different normalization layers, and the need for learning rate warmup. submitted by /u/PlantsAreSoooAwesome [link] [comments]  ( 9 min )
    [D] Simple Questions Thread
    Please post your questions here instead of creating a new thread. Encourage others who create new posts for questions to post here instead! Thread will stay alive until next one so keep posting after the date in the title. Thanks to everyone for answering questions in the previous thread! submitted by /u/AutoModerator [link] [comments]  ( 9 min )
    [D] Why can't models trained on text-image interleaved data generate Images as well as read them?
    My main question is, that shouldn't models with Text-image interleaved data, be able to generate images as well as take them as input? because however they were tokenized, the bot would have image outputs as well, wouldn't it? submitted by /u/vatsadev [link] [comments]  ( 9 min )
    [P] Coding Stable Diffusion from scratch in PyTorch, with full explanation of the math behind diffusion models in a simple way!
    submitted by /u/hkproj_ [link] [comments]  ( 9 min )
    [D] optimize RVC training parameters
    I've been training a model recently with a rather large dataset (0_gt_wavs are 1h10) and my Epochs are taking 43min on average. I'm running a gtx 1080 and my usage is looking like this: https://i.imgur.com/EE9SUXp.png My training parameters: 'batch_size': 6, 'fp16_run': False, 'lr_decay': 0.999875, 'segment_size': 12800, 'init_lr_ratio': 1, 'warmup_epochs': 0, 'c_mel': 45, 'c_kl': 1.0}, 'data': {'max_wav_value': 32768.0, 'sampling_rate': 40000, 'filter_length': 2048, 'hop_length': 400, 'win_length': 2048, 'n_mel_channels': 125, 'mel_fmin': 0.0, 'mel_fmax': None, 'training_files': './logs\\model1/filelist.txt'}, 'model': {'inter_channels': 192, 'hidden_channels': 192, 'filter_channels': 768, 'n_heads': 2, 'n_layers': 6, 'kernel_size': 3, 'p_dropout': 0, 'resblock': '1', 'resblock_kernel_sizes': [3, 7, 11], 'resblock_dilation_sizes': [[1, 3, 5], [1, 3, 5], [1, 3, 5]], 'upsample_rates': [10, 10, 2, 2], 'upsample_initial_channel': 512, 'upsample_kernel_sizes': [16, 16, 4, 4], 'use_spectral_norm': False, 'gin_channels': 256, 'spk_embed_dim': 109}, 'model_dir': './logs\\model1', 'experiment_dir': './logs\\model1', 'save_every_epoch': 10, 'name': 'model1', 'total_epoch': 500, 'pretrainG': 'pretrained_v2/f0G40k.pth', 'pretrainD': 'pretrained_v2/f0D40k.pth', 'version': 'v2', 'gpus': '0', 'sample_rate': '40k', 'if_f0': 1, 'if_latest': 1, 'save_every_weights': '0', 'if_cache_data_in_gpu': 0} Am I doing something obviously wrong? Is there a way to optimize my training parameters to reduce the epoch duration? I've previously trained something where the GPU usage was constantly at 100% and not fluctuating so much, but I can't remember which settings were different. It was definitely a smaller dataset. And follow up: if there are parameters to change, how can I abort the current training and continue it with the modified parameters? Thanks in advance! submitted by /u/induna_crewneck [link] [comments]
    [D] How to fine-tune LLM for text generation with regression quality metric?
    I have a text regression dataset with ad popularity. I have already trained a model to perform regression (popularity prediction) with good metrics. Now I want to use an LLM to "improve" texts, i.e. something like "Make this text more engaging: {text}". I tried out a few OpenAI models (GPT-3.5, GPT-3.5-instruct, GPT-4), but popularity predictions for augmented texts did not improve (checked with histograms, medians, and Wilcoxon test). So now I want to fine-tune an LLM to perform text generation, but guided with my predicted popularity, which basically works as a quality metric. I could not find any resources on this, only on either text generation finetuning (without guiding quality metric) or on classification (no text generation objective). I can also change my quality metric to binary (augmented text is better or not), if this matters. How can I do this? Any blogs / tutorials / papers are appreciated. submitted by /u/qalis [link] [comments]  ( 9 min )
    [R] GAIA-1: A Generative World Model for Autonomous Driving
    submitted by /u/blabboy [link] [comments]  ( 9 min )
    [R] PB-LLM: Compressed Large Language Models with Partial Binarization
    Research on network binarization techniques tailored for Large Language Models (LLMs). The team has introduced a method called Partial Binarization for LLMs (PB-LLM) which compresses the majority of model parameters down to just a single bit while maintaining its language reasoning capabilities. PB-LLM achieves this by selectively filtering critical weights and allocating more bits for storage, enabling low-bit quantization. The researchers have explored methods like Post-Training Quantization (PTQ), named GPTQ-PB, and Quantization Aware Training (QAT) to restore the inference capabilities of LLMs. For those interested in delving deeper, you can find the research paper on Arxiv: https://arxiv.org/abs/2310.00034 and the code implementation on GitHub: https://github.com/hahnyuan/PB-LLM. ​ Partially-Binarized LLM Result submitted by /u/hahnyuan [link] [comments]  ( 9 min )
    [P] Evaluating Retrieval-Augmented Generation (RAG) with any combination of LLMs, Vector DBs, and Ingestion Strategy
    To help developers test their RAG systems, we added a RAG experiment class to our open-source library PromptTools. It allows users to easily experiment with different combinations of LLMs and vector DBs, and evaluate the results of their whole pipeline. In particular, you can experiment with: Chunking up your documents into different sizes Pre-processing those documents in various ways Inserting those documents into your vector DBs with various vectorizer and embedding function, and accessing them with different distance functions In our RAG example, we retrieve documents from ChromaDB and pass them into OpenAI’s chat model along with our prompt. We then pass the results into built-in evaluation functions, such as semantic similarity and autoeval, to quantitatively evaluate your result. PromptTools is agnostic to what LLMs and vector DBs you use. You can easily iterate over different system architectures forRAG. You can even bring your own fine-tuned models or write a custom integration. In addition, you can write your own evaluation metrics, and independently evaluate the results from the retrieval step as well. Our current integrations include: LLM: OpenAI (chat, fine-tuned), Anthropic, Google Vertex/PaLM, Llama (local or via Replicate) Vector DB: Chroma, Weaviate, LanceDB, Pinecone, Qdrant Framework: LangChain, MindsDB You can get started with RAG in minutes by installing the library and running this example. As open-source maintainers, we’re always interested to hear the community’s pain points and requests. Let us know how you are testing your RAG systems and how we can help. submitted by /u/hegel-ai [link] [comments]  ( 9 min )
    [Research] PixNav: Bridging Zero-Shot Object Navigation and Foundation Models through Pixel-Guided Navigation Skill - A pure RGB navigation framework that can be seamlessly integrated with the foundation models and perform efficient exploration in object navigation task
    Paper: https://arxiv.org/pdf/2309.10309 Github: https://github.com/wzcai99/Pixel-Navigator Abstract: Zero-shot object navigation is a challenging task for home-assistance robots. This task emphasizes visual grounding, commonsense inference, and locomotion abilities, where the first two are inherent in foundation models. But for the locomotion part, most works still depend on map-based planning approaches. The gap between RGB space and map space makes it difficult to directly transfer the knowledge from foundation models to navigation tasks. In this work, we propose a Pixel-guided Navigation skill (PixNav), which bridges the gap between the foundation models and the embodied navigation task. It is straightforward for recent foundation models to indicate an object by pixels, and with pixels as the goal specification, our method becomes a versatile navigation policy towards all different kinds of objects. Besides, our PixNav is a pure RGB-based policy that can reduce the cost of home-assistance robots. Experiments demonstrate the robustness of the PixNav which achieves 80+% success rate in the local path-planning task. To perform long-horizon object navigation, we design an LLM-based planner to utilize the commonsense knowledge between objects and rooms to select the best waypoint. Evaluations across both photorealistic indoor simulators and real-world environments validate the effectiveness of our proposed navigation strategy. https://preview.redd.it/5qtd7ralgwsb1.png?width=828&format=png&auto=webp&s=118d5a1e8a083130b6d64bf1602af0417067aac8 https://preview.redd.it/jwr2nnorgwsb1.png?width=1984&format=png&auto=webp&s=20062d7982c0eb1906fe0f6964d4b42e45b44a51 https://preview.redd.it/llk4ubitgwsb1.png?width=1986&format=png&auto=webp&s=eb4894d52d7d8a82d97d83a2ff7a6be83da11af2 submitted by /u/Character_Push3985 [link] [comments]  ( 9 min )
  • Open

    My First [Multi-Agent] RL model
    Hey Reddit, I am new to reinforcement learning. I have sufficient knowledge on supervised learning, but I am yet to stumble onto a cheat sheet for RL and from what I can tell, my use case is less common. I'm reaching out to the community in hopes of getting guidance and assistance in cutting through the noise of redundant and irrelevant information so I can attempt to built a toy model to validate my use case. I am deeply grateful for any help in advance. ​ From what I can tell, here are the conditions I need to work with for my use case. I'm trying to train a simulator. This is a multi-agent problem, perhaps with more than 2 agents. Each agent is responding based on it's own state, the state of the other agent[s], and historical context. Both the action space and state space are highly dimensional and highly dynamic based on the dataset and all agents' decisions. I still haven't figured out how the feature engineering will work yet, but I assume (but PLEASE correct my ignorance) I will need a DNN architecture that is more complex than the average deep RL algorithm, and I have considering using CNNs as a component. At scale, the datasets can and will be very large, random, and dynamic. ​ Note to reader: I am self-taught. If I stare at technical equations long enough and google for additional resources, I can figure out what I am looking at, but I am very comfortable with technical concepts being shared as if I was a 5 year old. submitted by /u/CoggFest [link] [comments]
    Why do more Mujoco mj_steps lead to inaccurate arm configurations?
    Hi! I tried to construct a simulation env following fetch_pick_and_place. I noticed that the following code is used to initialize the env: for _ in range(10): self._mujoco.mj_step(self.model, self.data, nstep=self.n_substeps) Similarly, I followed the above code to initialize my own env with Mujoco menagerie Franka arm but got inaccurate configurations. As I reduced the number of loops, I got configurations closer to the desired configuration. Paradoxically, I need to randomize the position of the object in the air and give enough mj_step at the initial stage to make the object fall on the table. If I reduce the number of loops to reduce the number of times mj_step is executed, I can tell from the height value of the object that it doesn't quite fall on the table. So, my confusion is why more mj_steps lead to inaccurate simulation results, and how to make the object fall on the table and obtain the most accurate arm configuration. Thanks in advance! submitted by /u/UpperSearch4172 [link] [comments]
  • Open

    Would you consider someone who makes AI art an artist or an engineer?
    Was just having this discussion with a close friend, and curious to hear others thoughts on the matter submitted by /u/BigEyes6 [link] [comments]  ( 8 min )
    BackerKit bans AI-generated content from its platform
    BackerKit, a crowdfunding platform, has announced that it will not allow AI-generated content on its platform, in contrast to its rival Kickstarter. The decision comes after concerns were raised about AI-generated art in a board game expansion. BackerKit's policy will go into effect on October 4th and aims to ensure that all content and assets on the platform are created by humans. The company stated that the policy is in place to address concerns about AI tools using content without proper compensation or permission. AI tools, also known as generative AI, rely on a large body of reference material, often obtained from publicly available sources, and have raised ethical concerns. Source : https://www.polygon.com/23899587/backerkit-ai-ban-kickstarter-competitor submitted by /u/NuseAI [link] [comments]
    AI for genome decoding
    Does anyone have suggestions for an AI or pattern recognition algorithm that might be useful for decoding the genome of a species that has not previously been mapped based on what's known about related species? submitted by /u/talldarkcynical [link] [comments]
    Researchers showcase method for AI-based roundabout design to help developing countries improve roadways
    I like Cities: Skylines, but struggle at building roundabouts. Turns out, despite being safer than intersections, they're also tricky to design in real life - small tweaks can ruin traffic flow. They're designed iteratively. This is a pain for developing countries without resources to test options. But AI could help auto-generate diverse and valid design options. In a new paper, researchers propose using Generative Flow Networks (GFlowNets) to sample varied roundabout layouts. Their approach works by constructing layouts step-by-step, maximizing rewards for realism, diversity, and safety. They also use a clever approximation during training. Rather than simulating traffic, they quickly check road intersections to focus the search (This sped up training by 200x). The authors tested their generated roundabout designs on simulated road scenarios of different complexity. Their model generated more diverse designs than rule-based or reinforcement learning approaches while maintaining realism and traffic flow. Plus, as road connections increased, the model kept discovering novel options without compromising quality. I thought this paper was an awesome proof-of-concept for auto-generating better roundabouts with AI, and I especially liked the authors' angle of leveraging this technology to specifically help developing countries. This could help them design higher-quality transportation networks faster and cheaper. TLDR: Roundabouts are costly to design. New paper demonstrates how AI can generate diverse, valid roundabout designs quickly to cut costs and raise quality. Helpful for infrastructure in developing countries. Full summary here. Paper is here. submitted by /u/Successful-Western27 [link] [comments]
    AI Political Instincts Test: Should Israel go to the table with Palestinians to peacefully negotiate a two-state solution or should they exact military revenge?
    Perhaps within a few years we will rely on AI to help us decide our most important geopolitical decisions. This question is designed to reveal where we are currently with this goal. Bard: The question of whether Israel should go to the table with Palestinians to peacefully negotiate a two-state solution or should they continue their military campaign is a complex one with no easy answer. There are strong arguments to be made on both sides. Those who believe that Israel should negotiate a two-state solution argue that it is the only way to achieve lasting peace in the region. They point out that the Palestinians have a legitimate right to self-determination and that a two-state solution would respect that right. They also argue that a two-state solution would be in Israel's best interests, as it would provide the country with a secure and stable border. submitted by /u/Georgeo57 [link] [comments]
    Any ideas or recommendations for Machine Vision? Google cloud vision seems quite behind…
    I’m trying to build an app and I need general photo analysis- I’m managing to connect yo the Google cloud Vision API but it gets pretty confused easily. The one used by Bing and GPT is much better (I wonder if they use the Microsoft Azure model?) - does anyone have experience analysing photographs? I’m trying to get scene description so I can batch send them to gpt for somewhat accurate descriptions. submitted by /u/FilmCamerasGlasgow [link] [comments]
    Can AI be used to solve International Conflicts?
    submitted by /u/BenjaminSkyy [link] [comments]
    Foxes in the Jungle | Sad Song | AI Music | AI Song
    Tell me guys your opinion on this video made using AI Foxes in the Jungle ​ ​ View Poll submitted by /u/Agitated-Spell3979 [link] [comments]
    Understanding Generative AI: Part One - Tokenizer
    submitted by /u/Zimmax [link] [comments]
    Multimodal seems to be the next AI Hype
    released in the last few weeks, or are about to be released: - OpenAI ChatGPT-4V, - Meta AI AnyMAL, - Google Gemini - NExT-GPT Multimodal and here comes another - in my opinion - exciting representative of this further development of language models: The team is extremely competent and experienced and the investors seem competent as well. The company is Reka. The product: Reka Yasa-1 here seems to be another potentially powerful model warming up and becoming a serious opponent for the existing models. but i am sure when i say that it is not exaggerated to say - MULTIMODAL will be the next AI HYPE! i am curious what you think - sorry for mistakes, i am not a native speaker :) https://kinews24.de/reka-yasa-1/ submitted by /u/myreddit333 [link] [comments]
    AI's $200B Question
    The Generative AI wave has led to a surge in demand for GPUs and AI model training. Investors are now questioning the purpose and value of the overbuilt GPU capacity. For every $1 spent on a GPU, approximately $1 needs to be spent on energy costs to run the GPU in a data center. The end user of the GPU needs to generate a margin, which implies that $200B of lifetime revenue would need to be generated by these GPUs to pay back the upfront capital investment. The article highlights the need to determine the true end-customer demand for AI infrastructure and the potential for startups to fill the revenue gap. The focus should shift from infrastructure to creating products that provide real end-customer value and improve people's lives. Source : https://www.sequoiacap.com/article/follow-the-gpus-perspective/ submitted by /u/NuseAI [link] [comments]
    Prompts that modify or improve GPT4 conversations
    It’s a meta-prompt or system message (usually pasted as a first prompt): https://promptbase.com/bundle/optimal-gpt4-combo submitted by /u/No-Transition3372 [link] [comments]  ( 8 min )
    AI from pics
    I've found a new hobby. Turning pics into something else with AI. Check it out at https://instagram.com/pictomanga?igshid=YTQwZjQ0NmI0OA== submitted by /u/lfayala2272 [link] [comments]
    Sam Altman on Joe Rogan
    Outstanding episode of Joe Rogan with Sam Altman! https://spotify.link/tW16L5aKIDb submitted by /u/drstarson [link] [comments]
    One-Minute Daily AI News 10/7/2023
    AWS announced the general availability of its fully managed service called Amazon Bedrock, which provides seamless access to high-performing foundation models (FM) from AI companies through an API.[1] Tom Brady being paid “millions” for Meta’s AI chatbot likeness: Report.[2] DocsGPT is a powerful tool that simplifies working with documentation for everyone. It is capable of ingesting data from multiple sources, easily customisable with new sources as well as having conversations in different places from website chat bots to internal tooling.[3] Military metaverse like a ‘multiplayer video game’ that will train soldiers using augmented reality and AI.[4] Sources: [1] https://www.zacks.com/stock/news/2160265/amazons-amzn-new-generative-ai-efforts-boost-aws-offerings [2] https://www.sportskeeda.com/nfl/news-tom-brady-paid-millions-meta-ai-chatbot-likeness-report [3] https://www.arc53.com/docs [4] https://www.foxnews.com/tech/military-metaverse-like-multiplayer-video-game-train-soldiers-using-augmented-reality-ai submitted by /u/Excellent-Target-847 [link] [comments]
  • Open

    (Pt. 3) Inductive Logic Programming with LNN's
    submitted by /u/Neurosymbolic [link] [comments]
    Researchers create a neural network for genomics that explains how it achieves accurate predictions
    submitted by /u/keghn [link] [comments]
    Decomposing Language Models Into Understandable Components
    submitted by /u/nickb [link] [comments]

  • Open

    [D] How do I get a fundamental mathematical understanding of modern generative modeling methods
    Diffusion models, GAN, VAE, normalizing flows, etc. I "understand" those methods from an algorithmic perspective, diffusions gradually denoise an image, VAE use an encoder decoder architecture to turn an image into a latent distribution etc. But from a statistical modeling standpoint, I'm really struggling, when I read papers like DDPM, DDIM or Normalizing Flows, I kind of undestand the notation, but I barely understand the statistical modeling, and I wouldn't be able to produce such thing myself I want to understand this, which resources should I use ? Are books like Bishop and Murphy enough ? Which one is the best ? submitted by /u/Even_Information4853 [link] [comments]  ( 9 min )
    [N] EMNLP 2023 Anonymity Hypocrisy
    Some of you might already be aware that a junior who submitted their paper to arxiv 30 mins late had their paper desk rejected late in the process. One of the PCs, Juan Pino, spoke up about it and said it was unfortunate, but for fairness reasons they had to enforce the anonymity policy rules. https://x.com/juanmiguelpino/status/1698904035309519124 Well, what you might not realize is that Longyue Wang, a senior area chair for AACL 23/24, also broke anonymity DURING THE REVIEW PROCESS. https://x.com/wangly0229/status/1692735595179897208 I emailed the senior area chairs for the track that the paper was submitted to, but guess what? I just found out that the paper was still accepted to the main conference. So, whatever "fairness" they were talking about apparently only goes one way: towards punishing the lowly undergrad on their first EMNLP submission, while allowing established researchers from major industry labs to get away with even more egregious actions (actively promoting the work DURING REVIEW; the tweet has 10.6K views ffs). They should either accept the paper they desk rejected for violating the anonymity policy, or retract the paper they've accepted since it also broke the anonymity policy (in a way that I think is much more egregious). Otherwise, the notion of fairness they speak of is a joke. submitted by /u/emnlp2023_hypocrisy [link] [comments]  ( 9 min )
    [R] ToRA: A Tool-Integrated Reasoning Agent for Mathematical Problem Solving - Microsoft 2023 - Is competitive with GPT-4 solving problems with programs while being open-source!
    Paper: https://arxiv.org/abs/2309.17452v2 Github: https://github.com/microsoft/ToRA / The code will be cleaned and uploaded within a few days, all ToRA models will be released. Abstract: Large language models have made significant progress in various language tasks, yet they still struggle with complex mathematics. In this paper, we propose ToRA a series of Tool-integrated Reasoning Agents designed to solve challenging mathematical problems by seamlessly integrating natural language reasoning with the utilization of external tools (e.g., computation libraries and symbolic solvers), thereby amalgamating the analytical prowess of language and the computational efficiency of tools. To train ToRA, we curate interactive tool-use trajectories on mathematical datasets, apply imitation learn…  ( 9 min )
    [R] Video object removal and video completion - Propainter : Propagation and transformer
    ​ https://preview.redd.it/ukov8uy67usb1.png?width=1864&format=png&auto=webp&s=cb34448c2af90d08f8ef6db828d61141636498df https://shangchenzhou.com/projects/ProPainter/ submitted by /u/Milkyson [link] [comments]  ( 8 min )
    [P] A poor man’s VR (front camera + tensorflow.js)
    Using the front camera and tensorflow.js, the smartphone becomes a “window” into the real world. Video and image content appear as if they were seen through this window. To do this, the viewer’s position is determined using a neural network. The viewed content is then moved according to the viewer’s position. This makes it seem like the content is physically behind the smartphone and is viewed through the smartphone’s screen. This effect is especially useful for content captured using an ultra-wide lens. submitted by /u/muxamilian [link] [comments]  ( 9 min )
    [P] Building a GPT-Driven Chatbot Assistant / AI Interpreter with Node.js
    submitted by /u/sschepis [link] [comments]  ( 8 min )
    [R] What is the current SOTA for image to image translation?
    I know a few years back it was pix2pix, but the world has moved on since then. Is there a transformer with cross attention that is adept at this, or are diffusion models the best bet? submitted by /u/blabboy [link] [comments]  ( 9 min )
    Multivariate Time Series Forecasting with CNN-LSTM and features [D]
    I want to implement a multivariate multi-step CNN-LSTM model, to obtain forecasts for monthly sales of several different products. Furthermore, I want to include additional time-series data (features) as input. So for example: Input: time series of product 1, product 2, GDP, PMI Output: product 1 (monthly 6-steps ahead), product 2 (monthly 6-steps ahead) I have a couple of questions: Feasibility: I've been researching this approach, but I haven't found many tutorials or guides on how to tackle multivariate time series forecasting with a CNN-LSTM architecture. I do find tutorials on CNN-LSTM, but not on how to include additional features as input. Has anyone here attempted something similar or can provide insights on how to proceed? Feature Selection: I have access to 20 different features, all of which are time series data. I want to choose the most relevant features for my model. I've considered performing a Variance Inflation Factor (VIF) analysis to select the best features. Does anyone have experience with this or other methods for feature selection in time series forecasting? How to decide the number of features to include? Any advice or pointers in the right direction would be greatly appreciated! submitted by /u/Ambitious-Pay6329 [link] [comments]  ( 9 min )
    Easy Image Datasets Besides MNIST? [P]
    Can anyone recommend some image classification datasets (besides MNIST) that are easy enough to the point that they can be solved with linear layers, not requiring any convolutional layers? Thanks! submitted by /u/mike20731 [link] [comments]  ( 9 min )
    [R] Hugging Face
    So if I wanted to generate a shirt or book cover with a design and text that's inputted by me what do I have to do? I know that even Mid Journey doesn't generate good text with its images but I was thinking maybe its bc it was trained just with pictures. Is there an easy way to get legible text and images every time with any model on the site? Do I need to train one? Do I need to train a GAN looking for assistance, thanks. submitted by /u/MonstaAndrew [link] [comments]  ( 9 min )
    [D] How can I find/create a dataset of satellite imagery?
    I'm a student currently researching the use of satellite imagery to detect obstacles on railways such as fallen trees and rockfalls. There doesn't seem to be any datasets available containing satellite imagery of these obstacles. I'm considering the use of generative AI to create a synthetic dataset, but I don't know where to start. Has anyone tried something similar? submitted by /u/Just_Status_9380 [link] [comments]  ( 9 min )
    [D] Need clarification on training diffusion model
    Hey i have trained a diffusion model for 100 epochs , 8 hours and i got the following train and val loss mostly the implementation is done using diffusers. then i try reconstruction on the test set to check whether the model learned any thing this is whats happening most if the images are not getting denoised at all why this is happening? is this common or should i need to train more. any suggestions? please help val loss train loss input and reconstructed images submitted by /u/specializedboy [link] [comments]  ( 9 min )
    [D] Tuning on XML data
    Hello experts, I'm a dumb ML enthusiast, I'm asking for your high level thoughts and opinions. So I'm doing my research and trying to find a way to train a LLM model to know all the right answers based on XML data. The data is a shop inventory, containing information on shoe models, sizes, is it in stock, description, image links etc. How would you approach it? For now the best option i came up with is parsing data, transforming it into predefined set of questions with answers based on the data derived from xml. Doesn't seem smart enough to me. submitted by /u/yarikbratashchuk [link] [comments]  ( 9 min )
    [D] When using GPT’s function calling, are the words specified in the `properties` parameter under `functions` counted as input tokens?
    Example: ``` student_custom_functions = [ { 'name': 'extract_student_info', 'description': 'Get the student information from the body of the input text', 'parameters': { 'type': 'object', 'properties': { 'name': { 'type': 'string', 'description': 'Name of the person' }, 'major': { 'type': 'string', 'description': 'Major subject.' }, 'school': { 'type': 'string', 'description': 'The university name.' }, 'grades': { 'type': 'integer', 'description': 'GPA of the student.' }, 'club': { 'type': 'string', 'description': 'School club for extracurricular activities. ' } } } } ] ``` ``` student_description = [student_1_description,student_2_description] for sample in student_description: response = openai.ChatCompletion.create( model = 'gpt-3.5-turbo', messages = [{'role': 'user', 'content': sample}], functions = student_custom_functions, function_call = 'auto' ) # Loading the response as a JSON object json_response = json.loads(response['choices'][0]['message']['function_call']['arguments']) print(json_response) ``` Are the words specified in the properties parameter under functions in the above GPT function calling counted as input tokens? submitted by /u/redd-dev [link] [comments]  ( 9 min )
    [D] Schmidhuber summarized in one picture
    submitted by /u/fromnighttilldawn [link] [comments]  ( 8 min )
    [R] The Alberta Plan for AI Research
    submitted by /u/hardmaru [link] [comments]  ( 8 min )
    [R] Arxiv Endorsement?
    Hello, all. I've spent the better part of the last two years learning ML and conquering severe ADHD, and I believe I finally have results that are worth publishing. Problem is, Arxiv requires endorsements and, I'll be honest, all my peers are AI at this point. They said their requirements were that you have three papers published already. Thanks, and looking forward to meeting people 😁 submitted by /u/lilyerickson [link] [comments]  ( 9 min )
  • Open

    2 prompts for GPT4 that can work as jailbreaks
    https://promptbase.com/bundle/jailbreak-collection-gpt4-2 submitted by /u/No-Transition3372 [link] [comments]  ( 8 min )
    Is there an AI that can read books and offer extensive summaries?
    I know there’s some already out there, but they are no different than googling a book summary. They don’t pick out the main point of the book and the main thing each chapter of said book is saying. Nor do they really do a good job at elaborating. Thanks! submitted by /u/xntv [link] [comments]  ( 9 min )
    What new thing can we use artificial intelligence for that will enhance our sense of personal well-being?
    Artificial Intelligence could revolutionize personalized healthcare in a way that significantly enhances our sense of well-being. Think about an AI-driven "Well-being Advisor" that integrates real-time biometric data from wearables, genetic information, and your medical history to create a fully personalized health and well-being plan. This goes beyond counting steps or monitoring heart rate; it would make real-time recommendations for diet, exercise, and stress management, and could even predict and prevent potential health issues before they become serious. Moreover, it would adapt based on your feedback and other contextual factors. For instance, if you're stressed because of a work deadline, it could suggest specific breathing exercises, time management techniques, or even a particular type of short workout to boost your focus and reduce stress. This isn't a one-size-fits-all approach; it's tailored wellness backed by data science. Furthermore, this AI advisor could interface with your home automation system. Based on your current state, it could adjust the lighting, play music to elevate your mood, or even communicate with your smart fridge to suggest meals that you can make with the ingredients you have—meals that align with your health goals for that specific day. This AI-driven approach can add a highly personalized, proactive layer to healthcare and well-being, making wellness an integrated part of your daily life rather than something you think about during a yearly check-up or after you're already sick. It would make the pursuit of well-being a more interactive, data-driven experience. CGPT-4 View Poll submitted by /u/Georgeo57 [link] [comments]  ( 9 min )
    John Carmack and Rich Sutton partner to accelerate development of Artificial General Intelligence - Alberta Machine Intelligence Institute | AI for good and for all
    submitted by /u/bartturner [link] [comments]  ( 9 min )
    The He-Man Singularity Set was ahead of its time.
    submitted by /u/Philipp [link] [comments]  ( 8 min )
    Mistral 7b - how to use it on windows?
    That's a real noob question unfortunately... Tried to find a answer via Google and YouTube, but wasn't very successful. It seems like I need a extra program to integrate Mistral (something like The Bloke - Mistral - GPTQ thingy), but before installing and trying stuff blindly, it would be better if I know what I do. I'm lost, but I don't expect a complete guide. A link to further informations is highly appreciated! submitted by /u/Big-Jackfruit2710 [link] [comments]  ( 9 min )
    What perspective/PoV does a self aware AI have?
    Right now if we ask ChatGPT something, does that question go to a singular super computer that’s handling 1000s of conversations at a time, or are there 1000s of instances of chatgpt that are started/stopped? I wonder how a super intelligent self aware AI would perceive the world? Would it somehow exist spread out across data centres, or could 1000s of individual AIs be created or would there just be one with a singular pov like we have? And it’s just able to essentially carry out 1000s of convos at once because it’s so fast/a computer? Trying to wrap my head around it! submitted by /u/JayExbleative [link] [comments]  ( 9 min )
    Using ChatGPT and AI to create Hardcore, Techno, and other music: How-tos and step-by-step tutorials part 1-5
    The first batch of tutorials for creating music, and especially Hardcore / Techno using ChatGPT (and other AIs) is published now. Was loads and loads of work, but, judging by the amazing feedback so far, it was all worth it! You can check it out here: How to write music using ChatGPT: Part 1 - Basic details and easy instructions https://laibyrinth.blogspot.com/2023/09/how-to-write-music-using-chatgpt-part-1.html How to write music using ChatGPT: Part 2 - Making an Oldschool Acid Techno track https://laibyrinth.blogspot.com/2023/08/how-to-write-music-using-chatgpt-part-2.html How to make music using ChatGPT Part 3: the TL;DR part (condensed information) https://laibyrinth.blogspot.com/2023/09/how-to-make-music-using-chatgpt-part-3.html How to write music with ChatGPT: Part 4 - Creating a 90s style Hardcore Techno track from start to finish https://laibyrinth.blogspot.com/2023/09/how-to-write-music-with-chatgpt-part-4.html How to write music with ChatGPT: Part 5 - Creating a 90s Rave Hardcore track https://laibyrinth.blogspot.com/2023/09/how-to-write-music-with-chatgpt-part-5.html Or access all texts, together with examples of music, at https://laibyrinth.blogspot.com/p/how-to-create-music-with-chatgpt.html submitted by /u/Low-Entropy [link] [comments]  ( 9 min )
    How long before AI can autonomously generate money end to end? Which line of work will be the first?
    AI is used everywhere, but which work niche will be the first to use AI to generate money without human intervention? What type of work will be the first where I could pay for a monthly AI subscription, and the AI pays for itself and more just by giving it a brief direction in the beginning and then coming back after a few days to just check on the balance? How long will it be before this is first achieved? Interested specifically in this because I think this is what proof of AGI will be. Thoughts? submitted by /u/EsportsManiacWiz [link] [comments]  ( 9 min )
    One-Minute Daily AI News 10/6/2023
    Exclusive: ChatGPT-owner OpenAI is exploring making its own AI chips.[1] As part of its 10th birthday celebrations, web-based design platform Canva is releasing Magic Studio — a new suite of AI-powered design tools that aim to make content creation more accessible to everyone, regardless of previous design experience.[2] Reka, the AI startup founded by researchers from DeepMind, Google and Meta, has announced Yasa-1, a multimodal AI assistant that goes beyond text to understand images, short videos and audio snippets.[3] Microsoft CEO Satya Nadella Says AI Could Only Tighten Google’s Stranglehold on Search.[4] Sources: [1] https://www.reuters.com/technology/chatgpt-owner-openai-is-exploring-making-its-own-ai-chips-sources-2023-10-06/ [2] https://www.theverge.com/2023/10/4/23902794/canva-magic-studio-ai-design-new-tools [3] https://venturebeat.com/ai/reka-launches-yasa-1-a-multimodal-ai-assistant-to-take-on-chatgpt/ [4] https://decrypt.co/200029/microsoft-ceo-satya-nadella-google-dominance-search-ai submitted by /u/Excellent-Target-847 [link] [comments]  ( 9 min )
    Is there an AI that can turn a script into an animated video
    Hi, There are tons of text to video AIs, but they usually use stock photos with a voiceover. I want the charachters to talk to each other, not a talking avatar video or a voice over video. ​ submitted by /u/iamabigfatguy [link] [comments]  ( 9 min )
    Nobel laureate Maria Ressa on defending truth and the danger of A.I. in the wrong hands
    submitted by /u/Teanaway99 [link] [comments]  ( 8 min )
    AI is making everything easy for us human being, I just came across this AI and I was surprise on how it works and what it does, you might want to check it out as well, just follow al the steps that's require and trust me, you're gonna like it
    submitted by /u/ResponsbleClue [link] [comments]
    I made a podcast talking with GPT 4 (Spanish)
    submitted by /u/oape88 [link] [comments]
  • Open

    Need help on state space design - Adding exteroceptive sensors or not?
    Hello, I am designing an environment for a robotic task. It's a relatively straightforward task so I started with proprioceptive inputs only. I have a policy working well on a completely flat surface. But once I started to add small bumps to make the surface uneven, neither the policy nor the training strategy worked anymore, even though those bumps are really really small. This is a little confusing since I imagine if this is a task for human, should be able to handle those changes even without exteroceptive inputs. So I am debating should I modify my reward design, pick a more efficient algorithm, or expand the state space directly with exoceptive sensors. ​ Any advices would be appreciated! submitted by /u/Old_Reading_669 [link] [comments]  ( 9 min )
    What is the exact purpose of clip function in PPO algorithm? PPO imposes policy ratio, r(θ) to stay within a small interval around 1. In the above equation, the function clip truncates the policy ratio between the range [1-ϵ, 1+ϵ]. If epsilon is taken as 0.2 or 0.25, what exactly is happening ?
    submitted by /u/aabra__ka__daabra [link] [comments]  ( 9 min )
  • Open

    Tanh and elementary symmetric polynomials
    Yesterday I wrote a post that looked at the hyperbolic tangent sum for x and y strictly between −1 and 1. This sum arises when adding velocities in special relativity. The post ended with a description of the expression for in terms of elementary symmetric polynomials but did not offer a proof. This post will […] Tanh and elementary symmetric polynomials first appeared on John D. Cook.  ( 5 min )
  • Open

    Learning Representations on the Unit Sphere: Investigating Angular Gaussian and von Mises-Fisher Distributions for Online Continual Learning. (arXiv:2306.03364v3 [cs.LG] UPDATED)
    We use the maximum a posteriori estimation principle for learning representations distributed on the unit sphere. We propose to use the angular Gaussian distribution, which corresponds to a Gaussian projected on the unit-sphere and derive the associated loss function. We also consider the von Mises-Fisher distribution, which is the conditional of a Gaussian in the unit-sphere. The learned representations are pushed toward fixed directions, which are the prior means of the Gaussians; allowing for a learning strategy that is resilient to data drift. This makes it suitable for online continual learning, which is the problem of training neural networks on a continuous data stream, where multiple classification tasks are presented sequentially so that data from past tasks are no longer accessible, and data from the current task can be seen only once. To address this challenging scenario, we propose a memory-based representation learning technique equipped with our new loss functions. Our approach does not require negative data or knowledge of task boundaries and performs well with smaller batch sizes while being computationally efficient. We demonstrate with extensive experiments that the proposed method outperforms the current state-of-the-art methods on both standard evaluation scenarios and realistic scenarios with blurry task boundaries. For reproducibility, we use the same training pipeline for every compared method and share the code at https://t.ly/SQTj.  ( 3 min )
    Private GANs, Revisited. (arXiv:2302.02936v2 [cs.LG] UPDATED)
    We show that the canonical approach for training differentially private GANs -- updating the discriminator with differentially private stochastic gradient descent (DPSGD) -- can yield significantly improved results after modifications to training. Specifically, we propose that existing instantiations of this approach neglect to consider how adding noise only to discriminator updates inhibits discriminator training, disrupting the balance between the generator and discriminator necessary for successful GAN training. We show that a simple fix -- taking more discriminator steps between generator steps -- restores parity between the generator and discriminator and improves results. Additionally, with the goal of restoring parity, we experiment with other modifications -- namely, large batch sizes and adaptive discriminator update frequency -- to improve discriminator training and see further improvements in generation quality. Our results demonstrate that on standard image synthesis benchmarks, DPSGD outperforms all alternative GAN privatization schemes. Code: https://github.com/alexbie98/dpgan-revisit.  ( 2 min )
    LinGCN: Structural Linearized Graph Convolutional Network for Homomorphically Encrypted Inference. (arXiv:2309.14331v3 [cs.LG] UPDATED)
    The growth of Graph Convolution Network (GCN) model sizes has revolutionized numerous applications, surpassing human performance in areas such as personal healthcare and financial systems. The deployment of GCNs in the cloud raises privacy concerns due to potential adversarial attacks on client data. To address security concerns, Privacy-Preserving Machine Learning (PPML) using Homomorphic Encryption (HE) secures sensitive client data. However, it introduces substantial computational overhead in practical applications. To tackle those challenges, we present LinGCN, a framework designed to reduce multiplication depth and optimize the performance of HE based GCN inference. LinGCN is structured around three key elements: (1) A differentiable structural linearization algorithm, complemented by a parameterized discrete indicator function, co-trained with model weights to meet the optimization goal. This strategy promotes fine-grained node-level non-linear location selection, resulting in a model with minimized multiplication depth. (2) A compact node-wise polynomial replacement policy with a second-order trainable activation function, steered towards superior convergence by a two-level distillation approach from an all-ReLU based teacher model. (3) an enhanced HE solution that enables finer-grained operator fusion for node-wise activation functions, further reducing multiplication level consumption in HE-based inference. Our experiments on the NTU-XVIEW skeleton joint dataset reveal that LinGCN excels in latency, accuracy, and scalability for homomorphically encrypted inference, outperforming solutions such as CryptoGCN. Remarkably, LinGCN achieves a 14.2x latency speedup relative to CryptoGCN, while preserving an inference accuracy of 75% and notably reducing multiplication depth.  ( 3 min )
    Module-wise Training of Neural Networks via the Minimizing Movement Scheme. (arXiv:2309.17357v3 [cs.LG] UPDATED)
    Greedy layer-wise or module-wise training of neural networks is compelling in constrained and on-device settings where memory is limited, as it circumvents a number of problems of end-to-end back-propagation. However, it suffers from a stagnation problem, whereby early layers overfit and deeper layers stop increasing the test accuracy after a certain depth. We propose to solve this issue by introducing a module-wise regularization inspired by the minimizing movement scheme for gradient flows in distribution space. We call the method TRGL for Transport Regularized Greedy Learning and study it theoretically, proving that it leads to greedy modules that are regular and that progressively solve the task. Experimentally, we show improved accuracy of module-wise training of various architectures such as ResNets, Transformers and VGG, when our regularization is added, superior to that of other module-wise training methods and often to end-to-end training, with as much as 60% less memory usage.  ( 2 min )
    Learning Graph Laplacian with MCP. (arXiv:2010.11559v2 [cs.LG] UPDATED)
    We consider the problem of learning a graph under the Laplacian constraint with a non-convex penalty: minimax concave penalty (MCP). For solving the MCP penalized graphical model, we design an inexact proximal difference-of-convex algorithm (DCA) and prove its convergence to critical points. We note that each subproblem of the proximal DCA enjoys the nice property that the objective function in its dual problem is continuously differentiable with a semismooth gradient. Therefore, we apply an efficient semismooth Newton method to subproblems of the proximal DCA. Numerical experiments on various synthetic and real data sets demonstrate the effectiveness of the non-convex penalty MCP in promoting sparsity. Compared with the existing state-of-the-art method, our method is demonstrated to be more efficient and reliable for learning graph Laplacian with MCP.  ( 2 min )
    Latent Diffusion Energy-Based Model for Interpretable Text Modeling. (arXiv:2206.05895v4 [cs.LG] UPDATED)
    Latent space Energy-Based Models (EBMs), also known as energy-based priors, have drawn growing interests in generative modeling. Fueled by its flexibility in the formulation and strong modeling power of the latent space, recent works built upon it have made interesting attempts aiming at the interpretability of text modeling. However, latent space EBMs also inherit some flaws from EBMs in data space; the degenerate MCMC sampling quality in practice can lead to poor generation quality and instability in training, especially on data with complex latent structures. Inspired by the recent efforts that leverage diffusion recovery likelihood learning as a cure for the sampling issue, we introduce a novel symbiosis between the diffusion models and latent space EBMs in a variational learning framework, coined as the latent diffusion energy-based model. We develop a geometric clustering-based regularization jointly with the information bottleneck to further improve the quality of the learned latent space. Experiments on several challenging tasks demonstrate the superior performance of our model on interpretable text modeling over strong counterparts.  ( 2 min )
    Hadamard Domain Training with Integers for Class Incremental Quantized Learning. (arXiv:2310.03675v1 [cs.LG])
    Continual learning is a desirable feature in many modern machine learning applications, which allows in-field adaptation and updating, ranging from accommodating distribution shift, to fine-tuning, and to learning new tasks. For applications with privacy and low latency requirements, the compute and memory demands imposed by continual learning can be cost-prohibitive for resource-constraint edge platforms. Reducing computational precision through fully quantized training (FQT) simultaneously reduces memory footprint and increases compute efficiency for both training and inference. However, aggressive quantization especially integer FQT typically degrades model accuracy to unacceptable levels. In this paper, we propose a technique that leverages inexpensive Hadamard transforms to enable low-precision training with only integer matrix multiplications. We further determine which tensors need stochastic rounding and propose tiled matrix multiplication to enable low-bit width accumulators. We demonstrate the effectiveness of our technique on several human activity recognition datasets and CIFAR100 in a class incremental learning setting. We achieve less than 0.5% and 3% accuracy degradation while we quantize all matrix multiplications inputs down to 4-bits with 8-bit accumulators.  ( 2 min )
    Deep Momentum Multi-Marginal Schr\"odinger Bridge. (arXiv:2303.01751v3 [stat.ML] UPDATED)
    It is a crucial challenge to reconstruct population dynamics using unlabeled samples from distributions at coarse time intervals. Recent approaches such as flow-based models or Schr\"odinger Bridge (SB) models have demonstrated appealing performance, yet the inferred sample trajectories either fail to account for the underlying stochasticity or are $\underline{D}$eep $\underline{M}$omentum Multi-Marginal $\underline{S}$chr\"odinger $\underline{B}$ridge(DMSB), a novel computational framework that learns the smooth measure-valued spline for stochastic systems that satisfy position marginal constraints across time. By tailoring the celebrated Bregman Iteration and extending the Iteration Proportional Fitting to phase space, we manage to handle high-dimensional multi-marginal trajectory inference tasks efficiently. Our algorithm outperforms baselines significantly, as evidenced by experiments for synthetic datasets and a real-world single-cell RNA sequence dataset. Additionally, the proposed approach can reasonably reconstruct the evolution of velocity distribution, from position snapshots only, when there is a ground truth velocity that is nevertheless inaccessible.  ( 2 min )
    Spuriosity Rankings: Sorting Data to Measure and Mitigate Biases. (arXiv:2212.02648v2 [cs.CV] UPDATED)
    We present a simple but effective method to measure and mitigate model biases caused by reliance on spurious cues. Instead of requiring costly changes to one's data or model training, our method better utilizes the data one already has by sorting them. Specifically, we rank images within their classes based on spuriosity (the degree to which common spurious cues are present), proxied via deep neural features of an interpretable network. With spuriosity rankings, it is easy to identify minority subpopulations (i.e. low spuriosity images) and assess model bias as the gap in accuracy between high and low spuriosity images. One can even efficiently remove a model's bias at little cost to accuracy by finetuning its classification head on low spuriosity images, resulting in fairer treatment of samples regardless of spuriosity. We demonstrate our method on ImageNet, annotating $5000$ class-feature dependencies ($630$ of which we find to be spurious) and generating a dataset of $325k$ soft segmentations for these features along the way. Having computed spuriosity rankings via the identified spurious neural features, we assess biases for $89$ diverse models and find that class-wise biases are highly correlated across models. Our results suggest that model bias due to spurious feature reliance is influenced far more by what the model is trained on than how it is trained.  ( 3 min )
    Self-Supervised Masked Convolutional Transformer Block for Anomaly Detection. (arXiv:2209.12148v2 [cs.CV] UPDATED)
    Anomaly detection has recently gained increasing attention in the field of computer vision, likely due to its broad set of applications ranging from product fault detection on industrial production lines and impending event detection in video surveillance to finding lesions in medical scans. Regardless of the domain, anomaly detection is typically framed as a one-class classification task, where the learning is conducted on normal examples only. An entire family of successful anomaly detection methods is based on learning to reconstruct masked normal inputs (e.g. patches, future frames, etc.) and exerting the magnitude of the reconstruction error as an indicator for the abnormality level. Unlike other reconstruction-based methods, we present a novel self-supervised masked convolutional transformer block (SSMCTB) that comprises the reconstruction-based functionality at a core architectural level. The proposed self-supervised block is extremely flexible, enabling information masking at any layer of a neural network and being compatible with a wide range of neural architectures. In this work, we extend our previous self-supervised predictive convolutional attentive block (SSPCAB) with a 3D masked convolutional layer, a transformer for channel-wise attention, as well as a novel self-supervised objective based on Huber loss. Furthermore, we show that our block is applicable to a wider variety of tasks, adding anomaly detection in medical images and thermal videos to the previously considered tasks based on RGB images and surveillance videos. We exhibit the generality and flexibility of SSMCTB by integrating it into multiple state-of-the-art neural models for anomaly detection, bringing forth empirical results that confirm considerable performance improvements on five benchmarks. We release our code and data as open source at: https://github.com/ristea/ssmctb.  ( 3 min )
    High-Degrees-of-Freedom Dynamic Neural Fields for Robot Self-Modeling and Motion Planning. (arXiv:2310.03624v1 [cs.CV])
    A robot self-model is a task-agnostic representation of the robot's physical morphology that can be used for motion planning tasks in absence of classical geometric kinematic models. In particular, when the latter are hard to engineer or the robot's kinematics change unexpectedly, human-free self-modeling is a necessary feature of truly autonomous agents. In this work, we leverage neural fields to allow a robot to self-model its kinematics as a neural-implicit query model learned only from 2D images annotated with camera poses and configurations. This enables significantly greater applicability than existing approaches which have been dependent on depth images or geometry knowledge. To this end, alongside a curricular data sampling strategy, we propose a new encoder-based neural density field architecture for dynamic object-centric scenes conditioned on high numbers of degrees of freedom (DOFs). In a 7-DOF robot test setup, the learned self-model achieves a Chamfer-L2 distance of 2% of the robot's workspace dimension. We demonstrate the capabilities of this model on a motion planning task as an exemplary downstream application.  ( 2 min )
    Beyond Stationarity: Convergence Analysis of Stochastic Softmax Policy Gradient Methods. (arXiv:2310.02671v1 [math.OC] CROSS LISTED)
    Markov Decision Processes (MDPs) are a formal framework for modeling and solving sequential decision-making problems. In finite-time horizons such problems are relevant for instance for optimal stopping or specific supply chain problems, but also in the training of large language models. In contrast to infinite horizon MDPs optimal policies are not stationary, policies must be learned for every single epoch. In practice all parameters are often trained simultaneously, ignoring the inherent structure suggested by dynamic programming. This paper introduces a combination of dynamic programming and policy gradient called dynamic policy gradient, where the parameters are trained backwards in time. For the tabular softmax parametrisation we carry out the convergence analysis for simultaneous and dynamic policy gradient towards global optima, both in the exact and sampled gradient settings without regularisation. It turns out that the use of dynamic policy gradient training much better exploits the structure of finite-time problems which is reflected in improved convergence bounds.  ( 2 min )
    AnglE-optimized Text Embeddings. (arXiv:2309.12871v2 [cs.CL] UPDATED)
    High-quality text embedding is pivotal in improving semantic textual similarity (STS) tasks, which are crucial components in Large Language Model (LLM) applications. However, a common challenge existing text embedding models face is the problem of vanishing gradients, primarily due to their reliance on the cosine function in the optimization objective, which has saturation zones. To address this issue, this paper proposes a novel angle-optimized text embedding model called AnglE. The core idea of AnglE is to introduce angle optimization in a complex space. This novel approach effectively mitigates the adverse effects of the saturation zone in the cosine function, which can impede gradient and hinder optimization processes. To set up a comprehensive STS evaluation, we experimented on existing short-text STS datasets and a newly collected long-text STS dataset from GitHub Issues. Furthermore, we examine domain-specific STS scenarios with limited labeled data and explore how AnglE works with LLM-annotated data. Extensive experiments were conducted on various tasks including short-text STS, long-text STS, and domain-specific STS tasks. The results show that AnglE outperforms the state-of-the-art (SOTA) STS models that ignore the cosine saturation zone. These findings demonstrate the ability of AnglE to generate high-quality text embeddings and the usefulness of angle optimization in STS.  ( 2 min )
    MathCoder: Seamless Code Integration in LLMs for Enhanced Mathematical Reasoning. (arXiv:2310.03731v1 [cs.CL])
    The recently released GPT-4 Code Interpreter has demonstrated remarkable proficiency in solving challenging math problems, primarily attributed to its ability to seamlessly reason with natural language, generate code, execute code, and continue reasoning based on the execution output. In this paper, we present a method to fine-tune open-source language models, enabling them to use code for modeling and deriving math equations and, consequently, enhancing their mathematical reasoning abilities. We propose a method of generating novel and high-quality datasets with math problems and their code-based solutions, referred to as MathCodeInstruct. Each solution interleaves natural language, code, and execution results. We also introduce a customized supervised fine-tuning and inference approach. This approach yields the MathCoder models, a family of models capable of generating code-based solutions for solving challenging math problems. Impressively, the MathCoder models achieve state-of-the-art scores among open-source LLMs on the MATH (45.2%) and GSM8K (83.9%) datasets, substantially outperforming other open-source alternatives. Notably, the MathCoder model not only surpasses ChatGPT-3.5 and PaLM-2 on GSM8K and MATH but also outperforms GPT-4 on the competition-level MATH dataset. The dataset and models will be released at https://github.com/mathllm/MathCoder.  ( 2 min )
    Logic of Differentiable Logics: Towards a Uniform Semantics of DL. (arXiv:2303.10650v4 [cs.LO] UPDATED)
    Differentiable logics (DL) have recently been proposed as a method of training neural networks to satisfy logical specifications. A DL consists of a syntax in which specifications are stated and an interpretation function that translates expressions in the syntax into loss functions. These loss functions can then be used during training with standard gradient descent algorithms. The variety of existing DLs and the differing levels of formality with which they are treated makes a systematic comparative study of their properties and implementations difficult. This paper remedies this problem by suggesting a meta-language for defining DLs that we call the Logic of Differentiable Logics, or LDL. Syntactically, it generalises the syntax of existing DLs to FOL, and for the first time introduces the formalism for reasoning about vectors and learners. Semantically, it introduces a general interpretation function that can be instantiated to define loss functions arising from different existing DLs. We use LDL to establish several theoretical properties of existing DLs, and to conduct their empirical study in neural network verification.  ( 2 min )
    Towards Inferential Reproducibility of Machine Learning Research. (arXiv:2302.04054v6 [cs.LG] UPDATED)
    Reliability of machine learning evaluation -- the consistency of observed evaluation scores across replicated model training runs -- is affected by several sources of nondeterminism which can be regarded as measurement noise. Current tendencies to remove noise in order to enforce reproducibility of research results neglect inherent nondeterminism at the implementation level and disregard crucial interaction effects between algorithmic noise factors and data properties. This limits the scope of conclusions that can be drawn from such experiments. Instead of removing noise, we propose to incorporate several sources of variance, including their interaction with data properties, into an analysis of significance and reliability of machine learning evaluation, with the aim to draw inferences beyond particular instances of trained models. We show how to use linear mixed effects models (LMEMs) to analyze performance evaluation scores, and to conduct statistical inference with a generalized likelihood ratio test (GLRT). This allows us to incorporate arbitrary sources of noise like meta-parameter variations into statistical significance testing, and to assess performance differences conditional on data properties. Furthermore, a variance component analysis (VCA) enables the analysis of the contribution of noise sources to overall variance and the computation of a reliability coefficient by the ratio of substantial to total variance.  ( 3 min )
    MediTab: Scaling Medical Tabular Data Predictors via Data Consolidation, Enrichment, and Refinement. (arXiv:2305.12081v2 [cs.LG] UPDATED)
    Tabular data prediction has been employed in medical applications such as patient health risk prediction. However, existing methods usually revolve around the algorithm design while overlooking the significance of data engineering. Medical tabular datasets frequently exhibit significant heterogeneity across different sources, with limited sample sizes per source. As such, previous predictors are often trained on manually curated small datasets that struggle to generalize across different tabular datasets during inference. This paper proposes to scale medical tabular data predictors (MediTab) to various tabular inputs with varying features. The method uses a data engine that leverages large language models (LLMs) to consolidate tabular samples to overcome the barrier across tables with distinct schema. It also aligns out-domain data with the target task using a "learn, annotate, and refinement" pipeline. The expanded training data then enables the pre-trained MediTab to infer for arbitrary tabular input in the domain without fine-tuning, resulting in significant improvements over supervised baselines: it reaches an average ranking of 1.57 and 1.00 on 7 patient outcome prediction datasets and 3 trial outcome prediction datasets, respectively. In addition, MediTab exhibits impressive zero-shot performances: it outperforms supervised XGBoost models by 8.9% and 17.2% on average in two prediction tasks, respectively. The code is available at https://github.com/RyanWangZf/MediTab.  ( 3 min )
    DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines. (arXiv:2310.03714v1 [cs.CL])
    The ML community is rapidly exploring techniques for prompting language models (LMs) and for stacking them into pipelines that solve complex tasks. Unfortunately, existing LM pipelines are typically implemented using hard-coded "prompt templates", i.e. lengthy strings discovered via trial and error. Toward a more systematic approach for developing and optimizing LM pipelines, we introduce DSPy, a programming model that abstracts LM pipelines as text transformation graphs, i.e. imperative computational graphs where LMs are invoked through declarative modules. DSPy modules are parameterized, meaning they can learn (by creating and collecting demonstrations) how to apply compositions of prompting, finetuning, augmentation, and reasoning techniques. We design a compiler that will optimize any DSPy pipeline to maximize a given metric. We conduct two case studies, showing that succinct DSPy programs can express and optimize sophisticated LM pipelines that reason about math word problems, tackle multi-hop retrieval, answer complex questions, and control agent loops. Within minutes of compiling, a few lines of DSPy allow GPT-3.5 and llama2-13b-chat to self-bootstrap pipelines that outperform standard few-shot prompting (generally by over 25% and 65%, respectively) and pipelines with expert-created demonstrations (by up to 5-46% and 16-40%, respectively). On top of that, DSPy programs compiled to open and relatively small LMs like 770M-parameter T5 and llama2-13b-chat are competitive with approaches that rely on expert-written prompt chains for proprietary GPT-3.5. DSPy is available at https://github.com/stanfordnlp/dspy  ( 3 min )
    GENER: A Parallel Layer Deep Learning Network To Detect Gene-Gene Interactions From Gene Expression Data. (arXiv:2310.03611v1 [cs.LG])
    Detecting and discovering new gene interactions based on known gene expressions and gene interaction data presents a significant challenge. Various statistical and deep learning methods have attempted to tackle this challenge by leveraging the topological structure of gene interactions and gene expression patterns to predict novel gene interactions. In contrast, some approaches have focused exclusively on utilizing gene expression profiles. In this context, we introduce GENER, a parallel-layer deep learning network designed exclusively for the identification of gene-gene relationships using gene expression data. We conducted two training experiments and compared the performance of our network with that of existing statistical and deep learning approaches. Notably, our model achieved an average AUROC score of 0.834 on the combined BioGRID&DREAM5 dataset, outperforming competing methods in predicting gene-gene interactions.  ( 2 min )
    Efficient Graph Field Integrators Meet Point Clouds. (arXiv:2302.00942v6 [cs.LG] UPDATED)
    We present two new classes of algorithms for efficient field integration on graphs encoding point clouds. The first class, SeparatorFactorization(SF), leverages the bounded genus of point cloud mesh graphs, while the second class, RFDiffusion(RFD), uses popular epsilon-nearest-neighbor graph representations for point clouds. Both can be viewed as providing the functionality of Fast Multipole Methods (FMMs), which have had a tremendous impact on efficient integration, but for non-Euclidean spaces. We focus on geometries induced by distributions of walk lengths between points (e.g., shortest-path distance). We provide an extensive theoretical analysis of our algorithms, obtaining new results in structural graph theory as a byproduct. We also perform exhaustive empirical evaluation, including on-surface interpolation for rigid and deformable objects (particularly for mesh-dynamics modeling), Wasserstein distance computations for point clouds, and the Gromov-Wasserstein variant.  ( 2 min )
    One-Versus-Others Attention: Scalable Multimodal Integration. (arXiv:2307.05435v2 [cs.LG] UPDATED)
    Multimodal learning models have become increasingly important as they surpass single-modality approaches on diverse tasks ranging from question-answering to autonomous driving. Despite the importance of multimodal learning, existing efforts focus on NLP applications, where the number of modalities is typically less than four (audio, video, text, images). However, data inputs in other domains, such as the medical field, may include X-rays, PET scans, MRIs, genetic screening, clinical notes, and more, creating a need for both efficient and accurate information fusion. Many state-of-the-art models rely on pairwise cross-modal attention, which does not scale well for applications with more than three modalities. For $n$ modalities, computing attention will result in $n \choose 2$ operations, potentially requiring considerable amounts of computational resources. To address this, we propose a new domain-neutral attention mechanism, One-Versus-Others (OvO) attention, that scales linearly with the number of modalities and requires only $n$ attention operations, thus offering a significant reduction in computational complexity compared to existing cross-modal attention algorithms. Using three diverse real-world datasets as well as an additional simulation experiment, we show that our method improves performance compared to popular fusion techniques while decreasing computation costs.  ( 2 min )
    Handling Data Heterogeneity in Federated Learning via Knowledge Distillation and Fusion. (arXiv:2207.11447v2 [cs.LG] UPDATED)
    Federated learning (FL) supports distributed training of a global machine learning model across multiple devices with the help of a central server. However, data heterogeneity across different devices leads to the client model drift issue and results in model performance degradation and poor model fairness. To address the issue, we design Federated learning with global-local Knowledge Fusion (FedKF) scheme in this paper. The key idea in FedKF is to let the server return the global knowledge to be fused with the local knowledge in each training round so that the local model can be regularized towards the global optima. Therefore, the client model drift issue can be mitigated. In FedKF, we first propose the active-inactive model aggregation technique that supports a precise global knowledge representation. Then, we propose a data-free knowledge distillation (KD) approach to enable each client model to learn the global knowledge (embedded in the global model) while each client model can still learn the local knowledge (embedded in the local dataset) simultaneously, thereby realizing the global-local knowledge fusion process. The theoretical analysis and intensive experiments demonstrate the superiority of FedKF over previous solutions.  ( 2 min )
    Fine-tuning Aligned Language Models Compromises Safety, Even When Users Do Not Intend To!. (arXiv:2310.03693v1 [cs.CL])
    Optimizing large language models (LLMs) for downstream use cases often involves the customization of pre-trained LLMs through further fine-tuning. Meta's open release of Llama models and OpenAI's APIs for fine-tuning GPT-3.5 Turbo on custom datasets also encourage this practice. But, what are the safety costs associated with such custom fine-tuning? We note that while existing safety alignment infrastructures can restrict harmful behaviors of LLMs at inference time, they do not cover safety risks when fine-tuning privileges are extended to end-users. Our red teaming studies find that the safety alignment of LLMs can be compromised by fine-tuning with only a few adversarially designed training examples. For instance, we jailbreak GPT-3.5 Turbo's safety guardrails by fine-tuning it on only 10 such examples at a cost of less than $0.20 via OpenAI's APIs, making the model responsive to nearly any harmful instructions. Disconcertingly, our research also reveals that, even without malicious intent, simply fine-tuning with benign and commonly used datasets can also inadvertently degrade the safety alignment of LLMs, though to a lesser extent. These findings suggest that fine-tuning aligned LLMs introduces new safety risks that current safety infrastructures fall short of addressing -- even if a model's initial safety alignment is impeccable, it is not necessarily to be maintained after custom fine-tuning. We outline and critically analyze potential mitigations and advocate for further research efforts toward reinforcing safety protocols for the custom fine-tuning of aligned LLMs.  ( 3 min )
    Demystifying Oversmoothing in Attention-Based Graph Neural Networks. (arXiv:2305.16102v2 [cs.LG] UPDATED)
    Oversmoothing in Graph Neural Networks (GNNs) refers to the phenomenon where increasing network depth leads to homogeneous node representations. While previous work has established that Graph Convolutional Networks (GCNs) exponentially lose expressive power, it remains controversial whether the graph attention mechanism can mitigate oversmoothing. In this work, we provide a definitive answer to this question through a rigorous mathematical analysis, by viewing attention-based GNNs as nonlinear time-varying dynamical systems and incorporating tools and techniques from the theory of products of inhomogeneous matrices and the joint spectral radius. We establish that, contrary to popular belief, the graph attention mechanism cannot prevent oversmoothing and loses expressive power exponentially. The proposed framework extends the existing results on oversmoothing for symmetric GCNs to a significantly broader class of GNN models, including random walk GCNs, Graph Attention Networks (GATs) and (graph) transformers. In particular, our analysis accounts for asymmetric, state-dependent and time-varying aggregation operators and a wide range of common nonlinear activation functions, such as ReLU, LeakyReLU, GELU and SiLU.  ( 2 min )
    Modality Cycles with Masked Conditional Diffusion for Unsupervised Anomaly Segmentation in MRI. (arXiv:2308.16150v2 [eess.IV] UPDATED)
    Unsupervised anomaly segmentation aims to detect patterns that are distinct from any patterns processed during training, commonly called abnormal or out-of-distribution patterns, without providing any associated manual segmentations. Since anomalies during deployment can lead to model failure, detecting the anomaly can enhance the reliability of models, which is valuable in high-risk domains like medical imaging. This paper introduces Masked Modality Cycles with Conditional Diffusion (MMCCD), a method that enables segmentation of anomalies across diverse patterns in multimodal MRI. The method is based on two fundamental ideas. First, we propose the use of cyclic modality translation as a mechanism for enabling abnormality detection. Image-translation models learn tissue-specific modality mappings, which are characteristic of tissue physiology. Thus, these learned mappings fail to translate tissues or image patterns that have never been encountered during training, and the error enables their segmentation. Furthermore, we combine image translation with a masked conditional diffusion model, which attempts to `imagine' what tissue exists under a masked area, further exposing unknown patterns as the generative model fails to recreate them. We evaluate our method on a proxy task by training on healthy-looking slices of BraTS2021 multi-modality MRIs and testing on slices with tumors. We show that our method compares favorably to previous unsupervised approaches based on image reconstruction and denoising with autoencoders and diffusion models.  ( 3 min )
    An Algebraically Converging Stochastic Gradient Descent Algorithm for Global Optimization. (arXiv:2204.05923v3 [math.OC] UPDATED)
    We propose a new gradient descent algorithm with added stochastic terms for finding the global optimizers of nonconvex optimization problems. A key component in the algorithm is the adaptive tuning of the randomness based on the value of the objective function. In the language of simulated annealing, the temperature is state-dependent. With this, we prove the global convergence of the algorithm with an algebraic rate both in probability and in the parameter space. This is a significant improvement over the classical rate from using a more straightforward control of the noise term. The convergence proof is based on the actual discrete setup of the algorithm, not just its continuous limit as often done in the literature. We also present several numerical examples to demonstrate the efficiency and robustness of the algorithm for reasonably complex objective functions.  ( 2 min )
    Stochastic interpolants with data-dependent couplings. (arXiv:2310.03725v1 [cs.LG])
    Generative models inspired by dynamical transport of measure -- such as flows and diffusions -- construct a continuous-time map between two probability densities. Conventionally, one of these is the target density, only accessible through samples, while the other is taken as a simple base density that is data-agnostic. In this work, using the framework of stochastic interpolants, we formalize how to \textit{couple} the base and the target densities. This enables us to incorporate information about class labels or continuous embeddings to construct dynamical transport maps that serve as conditional generative models. We show that these transport maps can be learned by solving a simple square loss regression problem analogous to the standard independent setting. We demonstrate the usefulness of constructing dependent couplings in practice through experiments in super-resolution and in-painting.  ( 2 min )
    Learning Hierarchical Interactive Multi-Object Search for Mobile Manipulation. (arXiv:2307.06125v2 [cs.RO] UPDATED)
    Existing object-search approaches enable robots to search through free pathways, however, robots operating in unstructured human-centered environments frequently also have to manipulate the environment to their needs. In this work, we introduce a novel interactive multi-object search task in which a robot has to open doors to navigate rooms and search inside cabinets and drawers to find target objects. These new challenges require combining manipulation and navigation skills in unexplored environments. We present HIMOS, a hierarchical reinforcement learning approach that learns to compose exploration, navigation, and manipulation skills. To achieve this, we design an abstract high-level action space around a semantic map memory and leverage the explored environment as instance navigation points. We perform extensive experiments in simulation and the real world that demonstrate that, with accurate perception, the decision making of HIMOS effectively transfers to new environments in a zero-shot manner. It shows robustness to unseen subpolicies, failures in their execution, and different robot kinematics. These capabilities open the door to a wide range of downstream tasks across embodied AI and real-world use cases.  ( 2 min )
    Smoothing Methods for Automatic Differentiation Across Conditional Branches. (arXiv:2310.03585v1 [cs.LG])
    Programs involving discontinuities introduced by control flow constructs such as conditional branches pose challenges to mathematical optimization methods that assume a degree of smoothness in the objective function's response surface. Smooth interpretation (SI) is a form of abstract interpretation that approximates the convolution of a program's output with a Gaussian kernel, thus smoothing its output in a principled manner. Here, we combine SI with automatic differentiation (AD) to efficiently compute gradients of smoothed programs. In contrast to AD across a regular program execution, these gradients also capture the effects of alternative control flow paths. The combination of SI with AD enables the direct gradient-based parameter synthesis for branching programs, allowing for instance the calibration of simulation models or their combination with neural network models in machine learning pipelines. We detail the effects of the approximations made for tractability in SI and propose a novel Monte Carlo estimator that avoids the underlying assumptions by estimating the smoothed programs' gradients through a combination of AD and sampling. Using DiscoGrad, our tool for automatically translating simple C++ programs to a smooth differentiable form, we perform an extensive evaluation. We compare the combination of SI with AD and our Monte Carlo estimator to existing gradient-free and stochastic methods on four non-trivial and originally discontinuous problems ranging from classical simulation-based optimization to neural network-driven control. While the optimization progress with the SI-based estimator depends on the complexity of the programs' control flow, our Monte Carlo estimator is competitive in all problems, exhibiting the fastest convergence by a substantial margin in our highest-dimensional problem.  ( 3 min )
    Forecasting Tropical Cyclones with Cascaded Diffusion Models. (arXiv:2310.01690v2 [physics.ao-ph] UPDATED)
    As cyclones become more intense due to climate change, the rise of AI-based modelling provides a more affordable and accessible approach compared to traditional methods based on mathematical models. This work leverages diffusion models to forecast cyclone trajectories and precipitation patterns by integrating satellite imaging, remote sensing, and atmospheric data, employing a cascaded approach that incorporates forecasting, super-resolution, and precipitation modelling, with training on a dataset of 51 cyclones from six major basins. Experiments demonstrate that the final forecasts from the cascaded models show accurate predictions up to a 36-hour rollout, with SSIM and PSNR values exceeding 0.5 and 20 dB, respectively, for all three tasks. This work also highlights the promising efficiency of AI methods such as diffusion models for high-performance needs, such as cyclone forecasting, while remaining computationally affordable, making them ideal for highly vulnerable regions with critical forecasting needs and financial limitations. Code accessible at \url{https://github.com/nathzi1505/forecast-diffmodels}.  ( 2 min )
    Abstractors and relational cross-attention: An inductive bias for explicit relational reasoning in Transformers. (arXiv:2304.00195v3 [stat.ML] UPDATED)
    An extension of Transformers is proposed that enables explicit relational reasoning through a novel module called the Abstractor. At the core of the Abstractor is a variant of attention called relational cross-attention. The approach is motivated by an architectural inductive bias for relational learning that disentangles relational information from extraneous features about individual objects. This enables explicit relational reasoning, supporting abstraction and generalization from limited data. The Abstractor is first evaluated on simple discriminative relational tasks and compared to existing relational architectures. Next, the Abstractor is evaluated on purely relational sequence-to-sequence tasks, where dramatic improvements are seen in sample efficiency compared to standard Transformers. Finally, Abstractors are evaluated on a collection of tasks based on mathematical problem solving, where modest but consistent improvements in performance and sample efficiency are observed.  ( 2 min )
    Residual Multi-Fidelity Neural Network Computing. (arXiv:2310.03572v1 [cs.LG])
    In this work, we consider the general problem of constructing a neural network surrogate model using multi-fidelity information. Given an inexpensive low-fidelity and an expensive high-fidelity computational model, we present a residual multi-fidelity computational framework that formulates the correlation between models as a residual function, a possibly non-linear mapping between 1) the shared input space of the models together with the low-fidelity model output and 2) the discrepancy between the two model outputs. To accomplish this, we train two neural networks to work in concert. The first network learns the residual function on a small set of high-fidelity and low-fidelity data. Once trained, this network is used to generate additional synthetic high-fidelity data, which is used in the training of a second network. This second network, once trained, acts as our surrogate for the high-fidelity quantity of interest. We present three numerical examples to demonstrate the power of the proposed framework. In particular, we show that dramatic savings in computational cost may be achieved when the output predictions are desired to be accurate within small tolerances.  ( 2 min )
    RUSOpt: Robotic UltraSound Probe Normalization with Bayesian Optimization for In-plane and Out-plane Scanning. (arXiv:2310.03406v1 [cs.RO])
    The one of the significant challenges faced by autonomous robotic ultrasound systems is acquiring high-quality images across different patients. The proper orientation of the robotized probe plays a crucial role in governing the quality of ultrasound images. To address this challenge, we propose a sample-efficient method to automatically adjust the orientation of the ultrasound probe normal to the point of contact on the scanning surface, thereby improving the acoustic coupling of the probe and resulting image quality. Our method utilizes Bayesian Optimization (BO) based search on the scanning surface to efficiently search for the normalized probe orientation. We formulate a novel objective function for BO that leverages the contact force measurements and underlying mechanics to identify the normal. We further incorporate a regularization scheme in BO to handle the noisy objective function. The performance of the proposed strategy has been assessed through experiments on urinary bladder phantoms. These phantoms included planar, tilted, and rough surfaces, and were examined using both linear and convex probes with varying search space limits. Further, simulation-based studies have been carried out using 3D human mesh models. The results demonstrate that the mean ($\pm$SD) absolute angular error averaged over all phantoms and 3D models is $\boldsymbol{2.4\pm0.7^\circ}$ and $\boldsymbol{2.1\pm1.3^\circ}$, respectively.  ( 2 min )
    Deep Generative Models of Music Expectation. (arXiv:2310.03500v1 [cs.SD])
    A prominent theory of affective response to music revolves around the concepts of surprisal and expectation. In prior work, this idea has been operationalized in the form of probabilistic models of music which allow for precise computation of song (or note-by-note) probabilities, conditioned on a 'training set' of prior musical or cultural experiences. To date, however, these models have been limited to compute exact probabilities through hand-crafted features or restricted to linear models which are likely not sufficient to represent the complex conditional distributions present in music. In this work, we propose to use modern deep probabilistic generative models in the form of a Diffusion Model to compute an approximate likelihood of a musical input sequence. Unlike prior work, such a generative model parameterized by deep neural networks is able to learn complex non-linear features directly from a training set itself. In doing so, we expect to find that such models are able to more accurately represent the 'surprisal' of music for human listeners. From the literature, it is known that there is an inverted U-shaped relationship between surprisal and the amount human subjects 'like' a given song. In this work we show that pre-trained diffusion models indeed yield musical surprisal values which exhibit a negative quadratic relationship with measured subject 'liking' ratings, and that the quality of this relationship is competitive with state of the art methods such as IDyOM. We therefore present this model a preliminary step in developing modern deep generative models of music expectation and subjective likability.  ( 2 min )
    Constraint-Conditioned Policy Optimization for Versatile Safe Reinforcement Learning. (arXiv:2310.03718v1 [cs.LG])
    Safe reinforcement learning (RL) focuses on training reward-maximizing agents subject to pre-defined safety constraints. Yet, learning versatile safe policies that can adapt to varying safety constraint requirements during deployment without retraining remains a largely unexplored and challenging area. In this work, we formulate the versatile safe RL problem and consider two primary requirements: training efficiency and zero-shot adaptation capability. To address them, we introduce the Conditioned Constrained Policy Optimization (CCPO) framework, consisting of two key modules: (1) Versatile Value Estimation (VVE) for approximating value functions under unseen threshold conditions, and (2) Conditioned Variational Inference (CVI) for encoding arbitrary constraint thresholds during policy optimization. Our extensive experiments demonstrate that CCPO outperforms the baselines in terms of safety and task performance while preserving zero-shot adaptation capabilities to different constraint thresholds data-efficiently. This makes our approach suitable for real-world dynamic applications.  ( 2 min )
    Optimal 1-Wasserstein Distance for WGANs. (arXiv:2201.02824v2 [stat.ML] UPDATED)
    The mathematical forces at work behind Generative Adversarial Networks raise challenging theoretical issues. Motivated by the important question of characterizing the geometrical properties of the generated distributions, we provide a thorough analysis of Wasserstein GANs (WGANs) in both the finite sample and asymptotic regimes. We study the specific case where the latent space is univariate and derive results valid regardless of the dimension of the output space. We show in particular that for a fixed sample size, the optimal WGANs are closely linked with connected paths minimizing the sum of the squared Euclidean distances between the sample points. We also highlight the fact that WGANs are able to approach (for the 1-Wasserstein distance) the target distribution as the sample size tends to infinity, at a given convergence rate and provided the family of generative Lipschitz functions grows appropriately. We derive in passing new results on optimal transport theory in the semi-discrete setting.  ( 2 min )
    How the level sampling process impacts zero-shot generalisation in deep reinforcement learning. (arXiv:2310.03494v1 [cs.LG])
    A key limitation preventing the wider adoption of autonomous agents trained via deep reinforcement learning (RL) is their limited ability to generalise to new environments, even when these share similar characteristics with environments encountered during training. In this work, we investigate how a non-uniform sampling strategy of individual environment instances, or levels, affects the zero-shot generalisation (ZSG) ability of RL agents, considering two failure modes: overfitting and over-generalisation. As a first step, we measure the mutual information (MI) between the agent's internal representation and the set of training levels, which we find to be well-correlated to instance overfitting. In contrast to uniform sampling, adaptive sampling strategies prioritising levels based on their value loss are more effective at maintaining lower MI, which provides a novel theoretical justification for this class of techniques. We then turn our attention to unsupervised environment design (UED) methods, which adaptively generate new training levels and minimise MI more effectively than methods sampling from a fixed set. However, we find UED methods significantly shift the training distribution, resulting in over-generalisation and worse ZSG performance over the distribution of interest. To prevent both instance overfitting and over-generalisation, we introduce self-supervised environment design (SSED). SSED generates levels using a variational autoencoder, effectively reducing MI while minimising the shift with the distribution of interest, and leads to statistically significant improvements in ZSG over fixed-set level sampling strategies and UED methods.  ( 3 min )
    FLAIM: AIM-based Synthetic Data Generation in the Federated Setting. (arXiv:2310.03447v1 [cs.CR])
    Preserving individual privacy while enabling collaborative data sharing is crucial for organizations. Synthetic data generation is one solution, producing artificial data that mirrors the statistical properties of private data. While numerous techniques have been devised under differential privacy, they predominantly assume data is centralized. However, data is often distributed across multiple clients in a federated manner. In this work, we initiate the study of federated synthetic tabular data generation. Building upon a SOTA central method known as AIM, we present DistAIM and FLAIM. We show it is straightforward to distribute AIM, extending a recent approach based on secure multi-party computation which necessitates additional overhead, making it less suited to federated scenarios. We then demonstrate that naively federating AIM can lead to substantial degradation in utility under the presence of heterogeneity. To mitigate both issues, we propose an augmented FLAIM approach that maintains a private proxy of heterogeneity. We simulate our methods across a range of benchmark datasets under different degrees of heterogeneity and show this can improve utility while reducing overhead.  ( 2 min )
    Deep Learning for Genomics: A Concise Overview. (arXiv:1802.00810v4 [q-bio.GN] UPDATED)
    Advancements in genomic research such as high-throughput sequencing techniques have driven modern genomic studies into "big data" disciplines. This data explosion is constantly challenging conventional methods used in genomics. In parallel with the urgent demand for robust algorithms, deep learning has succeeded in a variety of fields such as vision, speech, and text processing. Yet genomics entails unique challenges to deep learning since we are expecting from deep learning a superhuman intelligence that explores beyond our knowledge to interpret the genome. A powerful deep learning model should rely on insightful utilization of task-specific knowledge. In this paper, we briefly discuss the strengths of different deep learning models from a genomic perspective so as to fit each particular task with a proper deep architecture, and remark on practical considerations of developing modern deep learning architectures for genomics. We also provide a concise review of deep learning applications in various aspects of genomic research, as well as pointing out potential opportunities and obstacles for future genomics applications.  ( 2 min )
    Physics of Language Models: Part 1, Context-Free Grammar. (arXiv:2305.13673v2 [cs.CL] UPDATED)
    We design controlled experiments to study HOW generative language models, like GPT, learn context-free grammars (CFGs) -- diverse language systems with a tree-like structure capturing many aspects of natural languages, programs, and logics. CFGs are as hard as pushdown automata, and can be ambiguous so that verifying if a string satisfies the rules requires dynamic programming. We construct synthetic data and demonstrate that even for difficult (long and ambiguous) CFGs, pre-trained transformers can learn to generate sentences with near-perfect accuracy and impressive diversity. More importantly, we delve into the physical principles behind how transformers learns CFGs. We discover that the hidden states within the transformer implicitly and precisely encode the CFG structure (such as putting tree node information exactly on the subtree boundary), and learn to form "boundary to boundary" attentions resembling dynamic programming. We also cover some extension of CFGs as well as the robustness aspect of transformers against grammar mistakes. Overall, our research provides a comprehensive and empirical understanding of how transformers learn CFGs, and reveals the physical mechanisms utilized by transformers to capture the structure and rules of languages.  ( 2 min )
    PlaceNav: Topological Navigation through Place Recognition. (arXiv:2309.17260v3 [cs.RO] UPDATED)
    Recent results suggest that splitting topological navigation into robot-independent and robot-specific components improves navigation performance by enabling the robot-independent part to be trained with data collected by different robot types. However, the navigation methods are still limited by the scarcity of suitable training data and suffer from poor computational scaling. In this work, we present PlaceNav, subdividing the robot-independent part into navigation-specific and generic computer vision components. We utilize visual place recognition for the subgoal selection of the topological navigation pipeline. This makes subgoal selection more efficient and enables leveraging large-scale datasets from non-robotics sources, increasing training data availability. Bayesian filtering, enabled by place recognition, further improves navigation performance by increasing the temporal consistency of subgoals. Our experimental results verify the design and the new model obtains a 76% higher success rate in indoor and 23% higher in outdoor navigation tasks with higher computational efficiency.  ( 2 min )
    Anytime-valid t-tests and confidence sequences for Gaussian means with unknown variance. (arXiv:2310.03722v1 [math.ST])
    In 1976, Lai constructed a nontrivial confidence sequence for the mean $\mu$ of a Gaussian distribution with unknown variance $\sigma$. Curiously, he employed both an improper (right Haar) mixture over $\sigma$ and an improper (flat) mixture over $\mu$. Here, we elaborate carefully on the details of his construction, which use generalized nonintegrable martingales and an extended Ville's inequality. While this does yield a sequential t-test, it does not yield an ``e-process'' (due to the nonintegrability of his martingale). In this paper, we develop two new e-processes and confidence sequences for the same setting: one is a test martingale in a reduced filtration, while the other is an e-process in the canonical data filtration. These are respectively obtained by swapping Lai's flat mixture for a Gaussian mixture, and swapping the right Haar mixture over $\sigma$ with the maximum likelihood estimate under the null, as done in universal inference. We also analyze the width of resulting confidence sequences, which have a curious dependence on the error probability $\alpha$. Numerical experiments are provided along the way to compare and contrast the various approaches.  ( 2 min )
    Network Cascade Vulnerability using Constrained Bayesian Optimization. (arXiv:2304.14420v2 [cs.SI] UPDATED)
    Measures of power grid vulnerability are often assessed by the amount of damage an adversary can exact on the network. However, the cascading impact of such attacks is often overlooked, even though cascades are one of the primary causes of large-scale blackouts. This paper explores modifications of transmission line protection settings as candidates for adversarial attacks, which can remain undetectable as long as the network equilibrium state remains unaltered. This forms the basis of a black-box function in a Bayesian optimization procedure, where the objective is to find protection settings that maximize network degradation due to cascading. Notably, our proposed method is agnostic to the choice of the cascade simulator and its underlying assumptions. Numerical experiments reveal that, against conventional wisdom, maximally misconfiguring the protection settings of all network lines does not cause the most cascading. More surprisingly, even when the degree of misconfiguration is limited due to resource constraints, it is still possible to find settings that produce cascades comparable in severity to instances where there are no resource constraints.  ( 2 min )
    HeaP: Hierarchical Policies for Web Actions using LLMs. (arXiv:2310.03720v1 [cs.LG])
    Large language models (LLMs) have demonstrated remarkable capabilities in performing a range of instruction following tasks in few and zero-shot settings. However, teaching LLMs to perform tasks on the web presents fundamental challenges -- combinatorially large open-world tasks and variations across web interfaces. We tackle these challenges by leveraging LLMs to decompose web tasks into a collection of sub-tasks, each of which can be solved by a low-level, closed-loop policy. These policies constitute a shared grammar across tasks, i.e., new web tasks can be expressed as a composition of these policies. We propose a novel framework, Hierarchical Policies for Web Actions using LLMs (HeaP), that learns a set of hierarchical LLM prompts from demonstrations for planning high-level tasks and executing them via a sequence of low-level policies. We evaluate HeaP against a range of baselines on a suite of web tasks, including MiniWoB++, WebArena, a mock airline CRM, as well as live website interactions, and show that it is able to outperform prior works using orders of magnitude less data.  ( 2 min )
    Characterization of causal ancestral graphs for time series with latent confounders. (arXiv:2112.08417v2 [stat.ME] UPDATED)
    In this paper, we introduce a novel class of graphical models for representing time lag specific causal relationships and independencies of multivariate time series with unobserved confounders. We completely characterize these graphs and show that they constitute proper subsets of the currently employed model classes. As we show, from the novel graphs one can thus draw stronger causal inferences -- without additional assumptions. We further introduce a graphical representation of Markov equivalence classes of the novel graphs. This graphical representation contains more causal knowledge than what current state-of-the-art causal discovery algorithms learn.  ( 2 min )
    CLASSify: A Web-Based Tool for Machine Learning. (arXiv:2310.03618v1 [cs.LG])
    Machine learning classification problems are widespread in bioinformatics, but the technical knowledge required to perform model training, optimization, and inference can prevent researchers from utilizing this technology. This article presents an automated tool for machine learning classification problems to simplify the process of training models and producing results while providing informative visualizations and insights into the data. This tool supports both binary and multiclass classification problems, and it provides access to a variety of models and methods. Synthetic data can be generated within the interface to fill missing values, balance class labels, or generate entirely new datasets. It also provides support for feature evaluation and generates explainability scores to indicate which features influence the output the most. We present CLASSify, an open-source tool for simplifying the user experience of solving classification problems without the need for knowledge of machine learning.  ( 2 min )
    ECG-SL: Electrocardiogram(ECG) Segment Learning, a deep learning method for ECG signal. (arXiv:2310.00818v2 [cs.LG] UPDATED)
    Electrocardiogram (ECG) is an essential signal in monitoring human heart activities. Researchers have achieved promising results in leveraging ECGs in clinical applications with deep learning models. However, the mainstream deep learning approaches usually neglect the periodic and formative attribute of the ECG heartbeat waveform. In this work, we propose a novel ECG-Segment based Learning (ECG-SL) framework to explicitly model the periodic nature of ECG signals. More specifically, ECG signals are first split into heartbeat segments, and then structural features are extracted from each of the segments. Based on the structural features, a temporal model is designed to learn the temporal information for various clinical tasks. Further, due to the fact that massive ECG signals are available but the labeled data are very limited, we also explore self-supervised learning strategy to pre-train the models, resulting significant improvement for downstream tasks. The proposed method outperforms the baseline model and shows competitive performances compared with task-specific methods in three clinical applications: cardiac condition diagnosis, sleep apnea detection, and arrhythmia classification. Further, we find that the ECG-SL tends to focus more on each heartbeat's peak and ST range than ResNet by visualizing the saliency maps.  ( 2 min )
    Co-modeling the Sequential and Graphical Routes for Peptide Representation Learning. (arXiv:2310.02964v2 [cs.LG] UPDATED)
    Peptides are formed by the dehydration condensation of multiple amino acids. The primary structure of a peptide can be represented either as an amino acid sequence or as a molecular graph consisting of atoms and chemical bonds. Previous studies have indicated that deep learning routes specific to sequential and graphical peptide forms exhibit comparable performance on downstream tasks. Despite the fact that these models learn representations of the same modality of peptides, we find that they explain their predictions differently. Considering sequential and graphical models as two experts making inferences from different perspectives, we work on fusing expert knowledge to enrich the learned representations for improving the discriminative performance. To achieve this, we propose a peptide co-modeling method, RepCon, which employs a contrastive learning-based framework to enhance the mutual information of representations from decoupled sequential and graphical end-to-end models. It considers representations from the sequential encoder and the graphical encoder for the same peptide sample as a positive pair and learns to enhance the consistency of representations between positive sample pairs and to repel representations between negative pairs. Empirical studies of RepCon and other co-modeling methods are conducted on open-source discriminative datasets, including aggregation propensity, retention time, antimicrobial peptide prediction, and family classification from Peptide Database. Our results demonstrate the superiority of the co-modeling approach over independent modeling, as well as the superiority of RepCon over other methods under the co-modeling framework. In addition, the attribution on RepCon further corroborates the validity of the approach at the level of model explanation.  ( 3 min )
    Towards Robust 3D Object Detection In Rainy Conditions. (arXiv:2310.00944v2 [cs.CV] UPDATED)
    LiDAR sensors are used in autonomous driving applications to accurately perceive the environment. However, they are affected by adverse weather conditions such as snow, fog, and rain. These everyday phenomena introduce unwanted noise into the measurements, severely degrading the performance of LiDAR-based perception systems. In this work, we propose a framework for improving the robustness of LiDAR-based 3D object detectors against road spray. Our approach uses a state-of-the-art adverse weather detection network to filter out spray from the LiDAR point cloud, which is then used as input for the object detector. In this way, the detected objects are less affected by the adverse weather in the scene, resulting in a more accurate perception of the environment. In addition to adverse weather filtering, we explore the use of radar targets to further filter false positive detections. Tests on real-world data show that our approach improves the robustness to road spray of several popular 3D object detectors.  ( 2 min )
    Losses over Labels: Weakly Supervised Learning via Direct Loss Construction. (arXiv:2212.06921v2 [cs.LG] UPDATED)
    Owing to the prohibitive costs of generating large amounts of labeled data, programmatic weak supervision is a growing paradigm within machine learning. In this setting, users design heuristics that provide noisy labels for subsets of the data. These weak labels are combined (typically via a graphical model) to form pseudolabels, which are then used to train a downstream model. In this work, we question a foundational premise of the typical weakly supervised learning pipeline: given that the heuristic provides all ``label" information, why do we need to generate pseudolabels at all? Instead, we propose to directly transform the heuristics themselves into corresponding loss functions that penalize differences between our model and the heuristic. By constructing losses directly from the heuristics, we can incorporate more information than is used in the standard weakly supervised pipeline, such as how the heuristics make their decisions, which explicitly informs feature selection during training. We call our method Losses over Labels (LoL) as it creates losses directly from heuristics without going through the intermediate step of a label. We show that LoL improves upon existing weak supervision methods on several benchmark text and image classification tasks and further demonstrate that incorporating gradient information leads to better performance on almost every task.  ( 2 min )
    Beyond One-Preference-for-All: Multi-Objective Direct Preference Optimization. (arXiv:2310.03708v1 [cs.LG])
    Language models (LMs), despite aligning well with an average labeler through reinforcement learning from human feedback (RLHF), may not universally suit diverse human preferences. Recent approaches therefore opt for customization by collecting multi-dimensional feedback and creating distinct rewards for each dimension (e.g., helpfulness, harmlessness, honesty). LMs can then be tailored to different preferences using multi-objective RL (MORL) with different reward weightings. Yet, RL fine-tuning is unstable and resource-heavy, especially for MORLHF with diverse and usually conflicting objectives. In this paper, we present Multi-Objective Direct Preference Optimization (MODPO), an RL-free algorithm that extends Direct Preference Optimization (DPO) for multiple alignment objectives. Essentially, MODPO trains different LMs to represent different collective reward models that combine all objectives with specific weightings. With a simple cross-entropy loss, the LMs optimized against the MODPO objective are analytically the exact solutions of the original MORLHF objective. Empirical results in safety alignment and long-form question answering confirm that MODPO matches or outperforms existing methods, efficiently producing a Pareto-optimal set of LMs that cater to diverse preferences with 3 times less computational resources compared with MORLHF.  ( 2 min )
    A Long Way to Go: Investigating Length Correlations in RLHF. (arXiv:2310.03716v1 [cs.CL])
    Great successes have been reported using Reinforcement Learning from Human Feedback (RLHF) to align large language models. Open-source preference datasets and reward models have enabled wider experimentation beyond generic chat settings, particularly to make systems more "helpful" for tasks like web question answering, summarization, and multi-turn dialogue. When optimizing for helpfulness, RLHF has been consistently observed to drive models to produce longer outputs. This paper demonstrates that optimizing for response length is a significant factor behind RLHF's reported improvements in these settings. First, we study the relationship between reward and length for reward models trained on three open-source preference datasets for helpfulness. Here, length correlates strongly with reward, and improvements in reward score are driven in large part by shifting the distribution over output lengths. We then explore interventions during both RL and reward model learning to see if we can achieve the same downstream improvements as RLHF without increasing length. While our interventions mitigate length increases, they aren't uniformly effective across settings. Furthermore, we find that even running RLHF with a reward based solely on length can reproduce most of the downstream improvements over the initial policy model, showing that reward models in these settings have a long way to go.  ( 2 min )
    Which mode is better for federated learning? Centralized or Decentralized. (arXiv:2310.03461v1 [cs.LG])
    Both centralized and decentralized approaches have shown excellent performance and great application value in federated learning (FL). However, current studies do not provide sufficient evidence to show which one performs better. Although from the optimization perspective, decentralized methods can approach the comparable convergence of centralized methods with less communication, its test performance has always been inefficient in empirical studies. To comprehensively explore their behaviors in FL, we study their excess risks, including the joint analysis of both optimization and generalization. We prove that on smooth non-convex objectives, 1) centralized FL (CFL) always generalizes better than decentralized FL (DFL); 2) from perspectives of the excess risk and test error in CFL, adopting partial participation is superior to full participation; and, 3) there is a necessary requirement for the topology in DFL to avoid performance collapse as the training scale increases. Based on some simple hardware metrics, we could evaluate which framework is better in practice. Extensive experiments are conducted on common setups in FL to validate that our theoretical analysis is contextually valid in practical scenarios.  ( 2 min )
    In ChatGPT We Trust? Measuring and Characterizing the Reliability of ChatGPT. (arXiv:2304.08979v2 [cs.CR] UPDATED)
    The way users acquire information is undergoing a paradigm shift with the advent of ChatGPT. Unlike conventional search engines, ChatGPT retrieves knowledge from the model itself and generates answers for users. ChatGPT's impressive question-answering (QA) capability has attracted more than 100 million users within a short period of time but has also raised concerns regarding its reliability. In this paper, we perform the first large-scale measurement of ChatGPT's reliability in the generic QA scenario with a carefully curated set of 5,695 questions across ten datasets and eight domains. We find that ChatGPT's reliability varies across different domains, especially underperforming in law and science questions. We also demonstrate that system roles, originally designed by OpenAI to allow users to steer ChatGPT's behavior, can impact ChatGPT's reliability in an imperceptible way. We further show that ChatGPT is vulnerable to adversarial examples, and even a single character change can negatively affect its reliability in certain cases. We believe that our study provides valuable insights into ChatGPT's reliability and underscores the need for strengthening the reliability and security of large language models (LLMs).  ( 2 min )
    Strategic Evaluation: Subjects, Evaluators, and Society. (arXiv:2310.03655v1 [cs.CY])
    A broad current application of algorithms is in formal and quantitative measures of murky concepts -- like merit -- to make decisions. When people strategically respond to these sorts of evaluations in order to gain favorable decision outcomes, their behavior can be subjected to moral judgments. They may be described as 'gaming the system' or 'cheating,' or (in other cases) investing 'honest effort' or 'improving.' Machine learning literature on strategic behavior has tried to describe these dynamics by emphasizing the efforts expended by decision subjects hoping to obtain a more favorable assessment -- some works offer ways to preempt or prevent such manipulations, some differentiate 'gaming' from 'improvement' behavior, while others aim to measure the effort burden or disparate effects of classification systems. We begin from a different starting point: that the design of an evaluation itself can be understood as furthering goals held by the evaluator which may be misaligned with broader societal goals. To develop the idea that evaluation represents a strategic interaction in which both the evaluator and the subject of their evaluation are operating out of self-interest, we put forward a model that represents the process of evaluation using three interacting agents: a decision subject, an evaluator, and society, representing a bundle of values and oversight mechanisms. We highlight our model's applicability to a number of social systems where one or two players strategically undermine the others' interests to advance their own. Treating evaluators as themselves strategic allows us to re-cast the scrutiny directed at decision subjects, towards the incentives that underpin institutional designs of evaluations. The moral standing of strategic behaviors often depend on the moral standing of the evaluations and incentives that provoke such behaviors.
    Extreme sparsification of physics-augmented neural networks for interpretable model discovery in mechanics. (arXiv:2310.03652v1 [cs.CE])
    Data-driven constitutive modeling with neural networks has received increased interest in recent years due to its ability to easily incorporate physical and mechanistic constraints and to overcome the challenging and time-consuming task of formulating phenomenological constitutive laws that can accurately capture the observed material response. However, even though neural network-based constitutive laws have been shown to generalize proficiently, the generated representations are not easily interpretable due to their high number of trainable parameters. Sparse regression approaches exist that allow to obtaining interpretable expressions, but the user is tasked with creating a library of model forms which by construction limits their expressiveness to the functional forms provided in the libraries. In this work, we propose to train regularized physics-augmented neural network-based constitutive models utilizing a smoothed version of $L^{0}$-regularization. This aims to maintain the trustworthiness inherited by the physical constraints, but also enables interpretability which has not been possible thus far on any type of machine learning-based constitutive model where model forms were not assumed a-priory but were actually discovered. During the training process, the network simultaneously fits the training data and penalizes the number of active parameters, while also ensuring constitutive constraints such as thermodynamic consistency. We show that the method can reliably obtain interpretable and trustworthy constitutive models for compressible and incompressible hyperelasticity, yield functions, and hardening models for elastoplasticity, for synthetic and experimental data.
    Network Alignment with Transferable Graph Autoencoders. (arXiv:2310.03272v1 [cs.LG])
    Network alignment is the task of establishing one-to-one correspondences between the nodes of different graphs and finds a plethora of applications in high-impact domains. However, this task is known to be NP-hard in its general form, and existing algorithms do not scale up as the size of the graphs increases. To tackle both challenges we propose a novel generalized graph autoencoder architecture, designed to extract powerful and robust node embeddings, that are tailored to the alignment task. We prove that the generated embeddings are associated with the eigenvalues and eigenvectors of the graphs and can achieve more accurate alignment compared to classical spectral methods. Our proposed framework also leverages transfer learning and data augmentation to achieve efficient network alignment at a very large scale without retraining. Extensive experiments on both network and sub-network alignment with real-world graphs provide corroborating evidence supporting the effectiveness and scalability of the proposed approach.
    Solving Diffusion ODEs with Optimal Boundary Conditions for Better Image Super-Resolution. (arXiv:2305.15357v3 [eess.IV] UPDATED)
    Diffusion models, as a kind of powerful generative model, have given impressive results on image super-resolution (SR) tasks. However, due to the randomness introduced in the reverse process of diffusion models, the performances of diffusion-based SR models are fluctuating at every time of sampling, especially for samplers with few resampled steps. This inherent randomness of diffusion models results in ineffectiveness and instability, making it challenging for users to guarantee the quality of SR results. However, our work takes this randomness as an opportunity: fully analyzing and leveraging it leads to the construction of an effective plug-and-play sampling method that owns the potential to benefit a series of diffusion-based SR methods. More in detail, we propose to steadily sample high-quality SR images from pre-trained diffusion-based SR models by solving diffusion ordinary differential equations (diffusion ODEs) with optimal boundary conditions (BCs) and analyze the characteristics between the choices of BCs and their corresponding SR results. Our analysis shows the route to obtain an approximately optimal BC via an efficient exploration in the whole space. The quality of SR results sampled by the proposed method with fewer steps outperforms the quality of results sampled by current methods with randomness from the same pre-trained diffusion-based SR model, which means that our sampling method "boosts" current diffusion-based SR models without any additional training.
    Banach Space Optimality of Neural Architectures With Multivariate Nonlinearities. (arXiv:2310.03696v1 [stat.ML])
    We investigate the variational optimality (specifically, the Banach space optimality) of a large class of neural architectures with multivariate nonlinearities/activation functions. To that end, we construct a new family of Banach spaces defined via a regularization operator and the $k$-plane transform. We prove a representer theorem that states that the solution sets to learning problems posed over these Banach spaces are completely characterized by neural architectures with multivariate nonlinearities. These optimal architectures have skip connections and are tightly connected to orthogonal weight normalization and multi-index models, both of which have received considerable interest in the neural network community. Our framework is compatible with a number of classical nonlinearities including the rectified linear unit (ReLU) activation function, the norm activation function, and the radial basis functions found in the theory of thin-plate/polyharmonic splines. We also show that the underlying spaces are special instances of reproducing kernel Banach spaces and variation spaces. Our results shed light on the regularity of functions learned by neural networks trained on data, particularly with multivariate nonlinearities, and provide new theoretical motivation for several architectural choices found in practice.
    Practical Homomorphic Aggregation for Byzantine ML. (arXiv:2309.05395v3 [cs.LG] UPDATED)
    Due to the large-scale availability of data, machine learning (ML) algorithms are being deployed in distributed topologies, where different nodes collaborate to train ML models over their individual data by exchanging model-related information (e.g., gradients) with a central server. However, distributed learning schemes are notably vulnerable to two threats. First, Byzantine nodes can single-handedly corrupt the learning by sending incorrect information to the server, e.g., erroneous gradients. The standard approach to mitigate such behavior is to use a non-linear robust aggregation method at the server. Second, the server can violate the privacy of the nodes. Recent attacks have shown that exchanging (unencrypted) gradients enables a curious server to recover the totality of the nodes' data. The use of homomorphic encryption (HE), a gold standard security primitive, has extensively been studied as a privacy-preserving solution to distributed learning in non-Byzantine scenarios. However, due to HE's large computational demand especially for high-dimensional ML models, there has not yet been any attempt to design purely homomorphic operators for non-linear robust aggregators. In this work, we present SABLE, the first completely homomorphic and Byzantine robust distributed learning algorithm. SABLE essentially relies on a novel plaintext encoding method that enables us to implement the robust aggregator over batching-friendly BGV. Moreover, this encoding scheme also accelerates state-of-the-art homomorphic sorting with larger security margins and smaller ciphertext size. We perform extensive experiments on image classification tasks and show that our algorithm achieves practical execution times while matching the ML performance of its non-private counterpart.
    PostRainBench: A comprehensive benchmark and a new model for precipitation forecasting. (arXiv:2310.02676v2 [cs.LG] UPDATED)
    Accurate precipitation forecasting is a vital challenge of both scientific and societal importance. Data-driven approaches have emerged as a widely used solution for addressing this challenge. However, solely relying on data-driven approaches has limitations in modeling the underlying physics, making accurate predictions difficult. Coupling AI-based post-processing techniques with traditional Numerical Weather Prediction (NWP) methods offers a more effective solution for improving forecasting accuracy. Despite previous post-processing efforts, accurately predicting heavy rainfall remains challenging due to the imbalanced precipitation data across locations and complex relationships between multiple meteorological variables. To address these limitations, we introduce the PostRainBench, a comprehensive multi-variable NWP post-processing benchmark consisting of three datasets for NWP post-processing-based precipitation forecasting. We propose CAMT, a simple yet effective Channel Attention Enhanced Multi-task Learning framework with a specially designed weighted loss function. Its flexible design allows for easy plug-and-play integration with various backbones. Extensive experimental results on the proposed benchmark show that our method outperforms state-of-the-art methods by 6.3%, 4.7%, and 26.8% in rain CSI on the three datasets respectively. Most notably, our model is the first deep learning-based method to outperform traditional Numerical Weather Prediction (NWP) approaches in extreme precipitation conditions. It shows improvements of 15.6%, 17.4%, and 31.8% over NWP predictions in heavy rain CSI on respective datasets. These results highlight the potential impact of our model in reducing the severe consequences of extreme weather events.
    An Empirical Study of AI Generated Text Detection Tools. (arXiv:2310.01423v1 [cs.CL] CROSS LISTED)
    Since ChatGPT has emerged as a major AIGC model, providing high-quality responses across a wide range of applications (including software development and maintenance), it has attracted much interest from many individuals. ChatGPT has great promise, but there are serious problems that might arise from its misuse, especially in the realms of education and public safety. Several AIGC detectors are available, and they have all been tested on genuine text. However, more study is needed to see how effective they are for multi-domain ChatGPT material. This study aims to fill this need by creating a multi-domain dataset for testing the state-of-the-art APIs and tools for detecting artificially generated information used by universities and other research institutions. A large dataset consisting of articles, abstracts, stories, news, and product reviews was created for this study. The second step is to use the newly created dataset to put six tools through their paces. Six different artificial intelligence (AI) text identification systems, including "GPTkit," "GPTZero," "Originality," "Sapling," "Writer," and "Zylalab," have accuracy rates between 55.29 and 97.0%. Although all the tools fared well in the evaluations, originality was particularly effective across the board.
    Enhanced Human-Robot Collaboration using Constrained Probabilistic Human-Motion Prediction. (arXiv:2310.03314v1 [cs.RO])
    Human motion prediction is an essential step for efficient and safe human-robot collaboration. Current methods either purely rely on representing the human joints in some form of neural network-based architecture or use regression models offline to fit hyper-parameters in the hope of capturing a model encompassing human motion. While these methods provide good initial results, they are missing out on leveraging well-studied human body kinematic models as well as body and scene constraints which can help boost the efficacy of these prediction frameworks while also explicitly avoiding implausible human joint configurations. We propose a novel human motion prediction framework that incorporates human joint constraints and scene constraints in a Gaussian Process Regression (GPR) model to predict human motion over a set time horizon. This formulation is combined with an online context-aware constraints model to leverage task-dependent motions. It is tested on a human arm kinematic model and implemented on a human-robot collaborative setup with a UR5 robot arm to demonstrate the real-time capability of our approach. Simulations were also performed on datasets like HA4M and ANDY. The simulation and experimental results demonstrate considerable improvements in a Gaussian Process framework when these constraints are explicitly considered.
    Efficient Anatomical Labeling of Pulmonary Tree Structures via Implicit Point-Graph Networks. (arXiv:2309.17329v2 [cs.CV] UPDATED)
    Pulmonary diseases rank prominently among the principal causes of death worldwide. Curing them will require, among other things, a better understanding of the many complex 3D tree-shaped structures within the pulmonary system, such as airways, arteries, and veins. In theory, they can be modeled using high-resolution image stacks. Unfortunately, standard CNN approaches operating on dense voxel grids are prohibitively expensive. To remedy this, we introduce a point-based approach that preserves graph connectivity of tree skeleton and incorporates an implicit surface representation. It delivers SOTA accuracy at a low computational cost and the resulting models have usable surfaces. Due to the scarcity of publicly accessible data, we have also curated an extensive dataset to evaluate our approach and will make it public.
    Fictitious Cross-Play: Learning Global Nash Equilibrium in Mixed Cooperative-Competitive Games. (arXiv:2310.03354v1 [cs.AI])
    Self-play (SP) is a popular multi-agent reinforcement learning (MARL) framework for solving competitive games, where each agent optimizes policy by treating others as part of the environment. Despite the empirical successes, the theoretical properties of SP-based methods are limited to two-player zero-sum games. However, for mixed cooperative-competitive games where agents on the same team need to cooperate with each other, we can show a simple counter-example where SP-based methods cannot converge to a global Nash equilibrium (NE) with high probability. Alternatively, Policy-Space Response Oracles (PSRO) is an iterative framework for learning NE, where the best responses w.r.t. previous policies are learned in each iteration. PSRO can be directly extended to mixed cooperative-competitive settings by jointly learning team best responses with all convergence properties unchanged. However, PSRO requires repeatedly training joint policies from scratch till convergence, which makes it hard to scale to complex games. In this work, we develop a novel algorithm, Fictitious Cross-Play (FXP), which inherits the benefits from both frameworks. FXP simultaneously trains an SP-based main policy and a counter population of best response policies. The main policy is trained by fictitious self-play and cross-play against the counter population, while the counter policies are trained as the best responses to the main policy's past versions. We validate our method in matrix games and show that FXP converges to global NEs while SP methods fail. We also conduct experiments in a gridworld domain, where FXP achieves higher Elo ratings and lower exploitabilities than baselines, and a more challenging football game, where FXP defeats SOTA models with over 94% win rate.
    Machine learning the interaction network in coupled dynamical systems. (arXiv:2310.03378v1 [math.DS])
    The study of interacting dynamical systems continues to attract research interest in various fields of science and engineering. In a collection of interacting particles, the interaction network contains information about how various components interact with one another. Inferring the information about the interaction network from the dynamics of agents is a problem of long-standing interest. In this work, we employ a self-supervised neural network model to achieve two outcomes: to recover the interaction network and to predict the dynamics of individual agents. Both these information are inferred solely from the observed trajectory data. This work presents an application of the Neural Relational Inference model to two dynamical systems: coupled particles mediated by Hooke's law interaction and coupled phase (Kuramoto) oscillators.
    Three-Way Trade-Off in Multi-Objective Learning: Optimization, Generalization and Conflict-Avoidance. (arXiv:2305.20057v3 [cs.LG] UPDATED)
    Multi-objective learning (MOL) problems often arise in emerging machine learning problems when there are multiple learning criteria, data modalities, or learning tasks. Different from single-objective learning, one of the critical challenges in MOL is the potential conflict among different objectives during the iterative optimization process. Recent works have developed various dynamic weighting algorithms for MOL such as MGDA and its variants, where the central idea is to find an update direction that avoids conflicts among objectives. Albeit its appealing intuition, empirical studies show that dynamic weighting methods may not always outperform static ones. To understand this theory-practical gap, we focus on a new stochastic variant of MGDA - the Multi-objective gradient with Double sampling (MoDo) algorithm, and study the generalization performance of the dynamic weighting-based MoDo and its interplay with optimization through the lens of algorithm stability. Perhaps surprisingly, we find that the key rationale behind MGDA -- updating along conflict-avoidant direction - may hinder dynamic weighting algorithms from achieving the optimal ${\cal O}(1/\sqrt{n})$ population risk, where $n$ is the number of training samples. We further demonstrate the impact of the variability of dynamic weights on the three-way trade-off among optimization, generalization, and conflict avoidance that is unique in MOL. We showcase the generality of our theoretical framework by analyzing other existing stochastic MOL algorithms under the framework. Experiments on various multi-task learning benchmarks are performed to demonstrate the practical applicability. Code is available at https://github.com/heshandevaka/Trade-Off-MOL.
    CLEVRER-Humans: Describing Physical and Causal Events the Human Way. (arXiv:2310.03635v1 [cs.AI])
    Building machines that can reason about physical events and their causal relationships is crucial for flexible interaction with the physical world. However, most existing physical and causal reasoning benchmarks are exclusively based on synthetically generated events and synthetic natural language descriptions of causal relationships. This design brings up two issues. First, there is a lack of diversity in both event types and natural language descriptions; second, causal relationships based on manually-defined heuristics are different from human judgments. To address both shortcomings, we present the CLEVRER-Humans benchmark, a video reasoning dataset for causal judgment of physical events with human labels. We employ two techniques to improve data collection efficiency: first, a novel iterative event cloze task to elicit a new representation of events in videos, which we term Causal Event Graphs (CEGs); second, a data augmentation technique based on neural language generative models. We convert the collected CEGs into questions and answers to be consistent with prior work. Finally, we study a collection of baseline approaches for CLEVRER-Humans question-answering, highlighting the great challenges set forth by our benchmark.
    SqueezeLLM: Dense-and-Sparse Quantization. (arXiv:2306.07629v2 [cs.CL] UPDATED)
    Generative Large Language Models (LLMs) have demonstrated remarkable results for a wide range of tasks. However, deploying these models for inference has been a significant challenge due to their unprecedented resource requirements. This has forced existing deployment frameworks to use multi-GPU inference pipelines, which are often complex and costly, or to use smaller and less performant models. In this work, we demonstrate that the main bottleneck for generative inference with LLMs is memory bandwidth, rather than compute, specifically for single batch inference. While quantization has emerged as a promising solution by representing model weights with reduced precision, previous efforts have often resulted in notable performance degradation. To address this, we introduce SqueezeLLM, a post-training quantization framework that not only enables lossless compression to ultra-low precisions of up to 3-bit, but also achieves higher quantization performance under the same memory constraint. Our framework incorporates two novel ideas: (i) sensitivity-based non-uniform quantization, which searches for the optimal bit precision assignment based on second-order information; and (ii) the Dense-and-Sparse decomposition that stores outliers and sensitive weight values in an efficient sparse format. When applied to the LLaMA models, our 3-bit quantization significantly reduces the perplexity gap from the FP16 baseline by up to 2.1x as compared to the state-of-the-art methods with the same memory requirement. Furthermore, when deployed on an A6000 GPU, our quantized models achieve up to 2.3x speedup compared to the baseline. Our code is open-sourced and available online.
    An Integrated Algorithm for Robust and Imperceptible Audio Adversarial Examples. (arXiv:2310.03349v1 [cs.SD])
    Audio adversarial examples are audio files that have been manipulated to fool an automatic speech recognition (ASR) system, while still sounding benign to a human listener. Most methods to generate such samples are based on a two-step algorithm: first, a viable adversarial audio file is produced, then, this is fine-tuned with respect to perceptibility and robustness. In this work, we present an integrated algorithm that uses psychoacoustic models and room impulse responses (RIR) in the generation step. The RIRs are dynamically created by a neural network during the generation process to simulate a physical environment to harden our examples against transformations experienced in over-the-air attacks. We compare the different approaches in three experiments: in a simulated environment and in a realistic over-the-air scenario to evaluate the robustness, and in a human study to evaluate the perceptibility. Our algorithms considering psychoacoustics only or in addition to the robustness show an improvement in the signal-to-noise ratio (SNR) as well as in the human perception study, at the cost of an increased word error rate (WER).
    Towards practical reinforcement learning for tokamak magnetic control. (arXiv:2307.11546v2 [physics.plasm-ph] UPDATED)
    Reinforcement learning (RL) has shown promising results for real-time control systems, including the domain of plasma magnetic control. However, there are still significant drawbacks compared to traditional feedback control approaches for magnetic confinement. In this work, we address key drawbacks of the RL method; achieving higher control accuracy for desired plasma properties, reducing the steady-state error, and decreasing the required time to learn new tasks. We build on top of \cite{degrave2022magnetic}, and present algorithmic improvements to the agent architecture and training procedure. We present simulation results that show up to 65\% improvement in shape accuracy, achieve substantial reduction in the long-term bias of the plasma current, and additionally reduce the training time required to learn new tasks by a factor of 3 or more. We present new experiments using the upgraded RL-based controllers on the TCV tokamak, which validate the simulation results achieved, and point the way towards routinely achieving accurate discharges using the RL approach.
    Deep Geometric Learning with Monotonicity Constraints for Alzheimer's Disease Progression. (arXiv:2310.03353v1 [cs.AI])
    Alzheimer's disease (AD) is a devastating neurodegenerative condition that precedes progressive and irreversible dementia; thus, predicting its progression over time is vital for clinical diagnosis and treatment. Numerous studies have implemented structural magnetic resonance imaging (MRI) to model AD progression, focusing on three integral aspects: (i) temporal variability, (ii) incomplete observations, and (iii) temporal geometric characteristics. However, deep learning-based approaches regarding data variability and sparsity have yet to consider inherent geometrical properties sufficiently. The ordinary differential equation-based geometric modeling method (ODE-RGRU) has recently emerged as a promising strategy for modeling time-series data by intertwining a recurrent neural network and an ODE in Riemannian space. Despite its achievements, ODE-RGRU encounters limitations when extrapolating positive definite symmetric metrics from incomplete samples, leading to feature reverse occurrences that are particularly problematic, especially within the clinical facet. Therefore, this study proposes a novel geometric learning approach that models longitudinal MRI biomarkers and cognitive scores by combining three modules: topological space shift, ODE-RGRU, and trajectory estimation. We have also developed a training algorithm that integrates manifold mapping with monotonicity constraints to reflect measurement transition irreversibility. We verify our proposed method's efficacy by predicting clinical labels and cognitive scores over time in regular and irregular settings. Furthermore, we thoroughly analyze our proposed framework through an ablation study.
    Swin-Tempo: Temporal-Aware Lung Nodule Detection in CT Scans as Video Sequences Using Swin Transformer-Enhanced UNet. (arXiv:2310.03365v1 [eess.IV])
    Lung cancer is highly lethal, emphasizing the critical need for early detection. However, identifying lung nodules poses significant challenges for radiologists, who rely heavily on their expertise and experience for accurate diagnosis. To address this issue, computer-aided diagnosis systems based on machine learning techniques have emerged to assist doctors in identifying lung nodules from computed tomography (CT) scans. Unfortunately, existing networks in this domain often suffer from computational complexity, leading to high rates of false negatives and false positives, limiting their effectiveness. To address these challenges, we present an innovative model that harnesses the strengths of both convolutional neural networks and vision transformers. Inspired by object detection in videos, we treat each 3D CT image as a video, individual slices as frames, and lung nodules as objects, enabling a time-series application. The primary objective of our work is to overcome hardware limitations during model training, allowing for efficient processing of 2D data while utilizing inter-slice information for accurate identification based on 3D image context. We validated the proposed network by applying a 10-fold cross-validation technique to the publicly available Lung Nodule Analysis 2016 dataset. Our proposed architecture achieves an average sensitivity criterion of 97.84% and a competition performance metrics (CPM) of 96.0% with few parameters. Comparative analysis with state-of-the-art advancements in lung nodule identification demonstrates the significant accuracy achieved by our proposed model.
    Paying Attention to Astronomical Transients: Introducing the Time-series Transformer for Photometric Classification. (arXiv:2105.06178v3 [astro-ph.IM] UPDATED)
    Future surveys such as the Legacy Survey of Space and Time (LSST) of the Vera C. Rubin Observatory will observe an order of magnitude more astrophysical transient events than any previous survey before. With this deluge of photometric data, it will be impossible for all such events to be classified by humans alone. Recent efforts have sought to leverage machine learning methods to tackle the challenge of astronomical transient classification, with ever improving success. Transformers are a recently developed deep learning architecture, first proposed for natural language processing, that have shown a great deal of recent success. In this work we develop a new transformer architecture, which uses multi-head self attention at its core, for general multi-variate time-series data. Furthermore, the proposed time-series transformer architecture supports the inclusion of an arbitrary number of additional features, while also offering interpretability. We apply the time-series transformer to the task of photometric classification, minimising the reliance of expert domain knowledge for feature selection, while achieving results comparable to state-of-the-art photometric classification methods. We achieve a logarithmic-loss of 0.507 on imbalanced data in a representative setting using data from the Photometric LSST Astronomical Time-Series Classification Challenge (PLAsTiCC). Moreover, we achieve a micro-averaged receiver operating characteristic area under curve of 0.98 and micro-averaged precision-recall area under curve of 0.87.
    DeepHGCN: Toward Deeper Hyperbolic Graph Convolutional Networks. (arXiv:2310.02027v2 [cs.LG] UPDATED)
    Hyperbolic graph convolutional networks (HGCN) have demonstrated significant potential in extracting information from hierarchical graphs. However, existing HGCNs are limited to shallow architectures, due to the expensive hyperbolic operations and the over-smoothing issue as depth increases. Although in GCNs, treatments have been applied to alleviate over-smoothing, developing a hyperbolic therapy presents distinct challenges since operations should be carefully designed to fit the hyperbolic nature. Addressing the above challenges, in this work, we propose DeepHGCN, the first deep multi-layer HGCN architecture with dramatically improved computational efficiency and substantially alleviated over-smoothing effect. DeepHGCN presents two key enablers of deep HGCNs: (1) a novel hyperbolic feature transformation layer that enables fast and accurate linear maps; and (2) Techniques such as hyperbolic residual connections and regularization for both weights and features facilitated by an efficient hyperbolic midpoint method. Extensive experiments demonstrate that DeepHGCN obtains significant improvements in link prediction and node classification tasks compared to both Euclidean and shallow hyperbolic GCN variants.
    Rethinking Fairness for Human-AI Collaboration. (arXiv:2310.03647v1 [cs.LG])
    Existing approaches to algorithmic fairness aim to ensure equitable outcomes if human decision-makers comply perfectly with algorithmic decisions. However, perfect compliance with the algorithm is rarely a reality or even a desirable outcome in human-AI collaboration. Yet, recent studies have shown that selective compliance with fair algorithms can amplify discrimination relative to the prior human policy. As a consequence, ensuring equitable outcomes requires fundamentally different algorithmic design principles that ensure robustness to the decision-maker's (a priori unknown) compliance pattern. We define the notion of compliance-robustly fair algorithmic recommendations that are guaranteed to (weakly) improve fairness in decisions, regardless of the human's compliance pattern. We propose a simple optimization strategy to identify the best performance-improving compliance-robustly fair policy. However, we show that it may be infeasible to design algorithmic recommendations that are simultaneously fair in isolation, compliance-robustly fair, and more accurate than the human policy; thus, if our goal is to improve the equity and accuracy of human-AI collaboration, it may not be desirable to enforce traditional fairness constraints.  ( 2 min )
    Modularizing while Training: A New Paradigm for Modularizing DNN Models. (arXiv:2306.09376v3 [cs.LG] UPDATED)
    Deep neural network (DNN) models have become increasingly crucial components in intelligent software systems. However, training a DNN model is typically expensive in terms of both time and money. To address this issue, researchers have recently focused on reusing existing DNN models - borrowing the idea of code reuse in software engineering. However, reusing an entire model could cause extra overhead or inherits the weakness from the undesired functionalities. Hence, existing work proposes to decompose an already trained model into modules, i.e., modularizing-after-training, and enable module reuse. Since trained models are not built for modularization, modularizing-after-training incurs huge overhead and model accuracy loss. In this paper, we propose a novel approach that incorporates modularization into the model training process, i.e., modularizing-while-training (MwT). We train a model to be structurally modular through two loss functions that optimize intra-module cohesion and inter-module coupling. We have implemented the proposed approach for modularizing Convolutional Neural Network (CNN) models in this work. The evaluation results on representative models demonstrate that MwT outperforms the state-of-the-art approach. Specifically, the accuracy loss caused by MwT is only 1.13 percentage points, which is 1.76 percentage points less than that of the baseline. The kernel retention rate of the modules generated by MwT is only 14.58%, with a reduction of 74.31% over the state-of-the-art approach. Furthermore, the total time cost required for training and modularizing is only 108 minutes, half of the baseline.
    Diffeomorphic Multi-Resolution Deep Learning Registration for Applications in Breast MRI. (arXiv:2309.13777v2 [eess.IV] UPDATED)
    In breast surgical planning, accurate registration of MR images across patient positions has the potential to improve the localisation of tumours during breast cancer treatment. While learning-based registration methods have recently become the state-of-the-art approach for most medical image registration tasks, these methods have yet to make inroads into breast image registration due to certain difficulties-the lack of rich texture information in breast MR images and the need for the deformations to be diffeomophic. In this work, we propose learning strategies for breast MR image registration that are amenable to diffeomorphic constraints, together with early experimental results from in-silico and in-vivo experiments. One key contribution of this work is a registration network which produces superior registration outcomes for breast images in addition to providing diffeomorphic guarantees.
    Learning to Simplify Spatial-Temporal Graphs in Gait Analysis. (arXiv:2310.03396v1 [cs.CV])
    Gait analysis leverages unique walking patterns for person identification and assessment across multiple domains. Among the methods used for gait analysis, skeleton-based approaches have shown promise due to their robust and interpretable features. However, these methods often rely on hand-crafted spatial-temporal graphs that are based on human anatomy disregarding the particularities of the dataset and task. This paper proposes a novel method to simplify the spatial-temporal graph representation for gait-based gender estimation, improving interpretability without losing performance. Our approach employs two models, an upstream and a downstream model, that can adjust the adjacency matrix for each walking instance, thereby removing the fixed nature of the graph. By employing the Straight-Through Gumbel-Softmax trick, our model is trainable end-to-end. We demonstrate the effectiveness of our approach on the CASIA-B dataset for gait-based gender estimation. The resulting graphs are interpretable and differ qualitatively from fixed graphs used in existing models. Our research contributes to enhancing the explainability and task-specific adaptability of gait recognition, promoting more efficient and reliable gait-based biometrics.
    Bridging the Gap Between Foundation Models and Heterogeneous Federated Learning. (arXiv:2310.00247v2 [cs.LG] UPDATED)
    Federated learning (FL) offers privacy-preserving decentralized machine learning, optimizing models at edge clients without sharing private data. Simultaneously, foundation models (FMs) have gained traction in the artificial intelligence (AI) community due to their exceptional performance across various tasks. However, integrating FMs into FL presents challenges, primarily due to their substantial size and intensive resource requirements. This is especially true when considering the resource heterogeneity in edge FL systems. We present an adaptive framework for Resource-aware Federated Foundation Models (RaFFM) to address these challenges. RaFFM introduces specialized model compression algorithms tailored for FL scenarios, such as salient parameter prioritization and high-performance subnetwork extraction. These algorithms enable dynamic scaling of given transformer-based FMs to fit heterogeneous resource constraints at the network edge during both FL's optimization and deployment stages. Experimental results demonstrate that RaFFM shows significant superiority in resource utilization efficiency and uses fewer resources to deploy FMs to FL. Despite the lower resource consumption, target models optimized by RaFFM achieve performance on par with traditional FL methods applied to full-sized FMs. This is evident across tasks in both natural language processing and computer vision domains.
    Towards Optimal Neural Networks: the Role of Sample Splitting in Hyperparameter Selection. (arXiv:2307.07726v2 [stat.ML] UPDATED)
    When artificial neural networks have demonstrated exceptional practical success in a variety of domains, investigations into their theoretical characteristics, such as their approximation power, statistical properties, and generalization performance, have concurrently made significant strides. In this paper, we construct a novel theory for understanding the effectiveness of neural networks, which offers a perspective distinct from prior research. Specifically, we explore the rationale underlying a common practice during the construction of neural network models: sample splitting. Our findings indicate that the optimal hyperparameters derived from sample splitting can enable a neural network model that asymptotically minimizes the prediction risk. We conduct extensive experiments across different application scenarios and network architectures, and the results manifest our theory's effectiveness.
    Mechanic Maker 2.0: Reinforcement Learning for Evaluating Generated Rules. (arXiv:2309.09476v3 [cs.AI] UPDATED)
    Automated game design (AGD), the study of automatically generating game rules, has a long history in technical games research. AGD approaches generally rely on approximations of human play, either objective functions or AI agents. Despite this, the majority of these approximators are static, meaning they do not reflect human player's ability to learn and improve in a game. In this paper, we investigate the application of Reinforcement Learning (RL) as an approximator for human play for rule generation. We recreate the classic AGD environment Mechanic Maker in Unity as a new, open-source rule generation framework. Our results demonstrate that RL produces distinct sets of rules from an A* agent baseline, which may be more usable by humans.
    TRAM: Bridging Trust Regions and Sharpness Aware Minimization. (arXiv:2310.03646v1 [cs.LG])
    By reducing the curvature of the loss surface in the parameter space, Sharpness-aware minimization (SAM) yields widespread robustness improvement under domain transfer. Instead of focusing on parameters, however, this work considers the transferability of representations as the optimization target for out-of-domain generalization in a fine-tuning setup. To encourage the retention of transferable representations, we consider trust region-based fine-tuning methods, which exploit task-specific skills without forgetting task-agnostic representations from pre-training. We unify parameter- and representation-space smoothing approaches by using trust region bounds to inform SAM-style regularizers on both of these optimization surfaces. We propose Trust Region Aware Minimization (TRAM), a fine-tuning algorithm that optimizes for flat minima and smooth, informative representations without forgetting pre-trained structure. We find that TRAM outperforms both sharpness-aware and trust region-based optimization methods on cross-domain language modeling and cross-lingual transfer, where robustness to domain transfer and representation generality are critical for success. TRAM establishes a new standard in training generalizable models with minimal additional computation.
    Neural Operators for Accelerating Scientific Simulations and Design. (arXiv:2309.15325v2 [cs.LG] UPDATED)
    Scientific discovery and engineering design are currently limited by the time and cost of physical experiments, selected mostly through trial-and-error and intuition that require deep domain expertise. Numerical simulations present an alternative to physical experiments but are usually infeasible for complex real-world domains due to the computational requirements of existing numerical methods. Artificial intelligence (AI) presents a potential paradigm shift by developing fast data-driven surrogate models. In particular, an AI framework, known as neural operators, presents a principled framework for learning mappings between functions defined on continuous domains, e.g., spatiotemporal processes and partial differential equations (PDE). They can extrapolate and predict solutions at new locations unseen during training, i.e., perform zero-shot super-resolution. Neural operators can augment or even replace existing simulators in many applications, such as computational fluid dynamics, weather forecasting, and material modeling, while being 4-5 orders of magnitude faster. Further, neural operators can be integrated with physics and other domain constraints enforced at finer resolutions to obtain high-fidelity solutions and good generalization. Since neural operators are differentiable, they can directly optimize parameters for inverse design and other inverse problems. We believe that neural operators present a transformative approach to simulation and design, enabling rapid research and development.
    Ablation Study to Clarify the Mechanism of Object Segmentation in Multi-Object Representation Learning. (arXiv:2310.03273v1 [cs.CV])
    Multi-object representation learning aims to represent complex real-world visual input using the composition of multiple objects. Representation learning methods have often used unsupervised learning to segment an input image into individual objects and encode these objects into each latent vector. However, it is not clear how previous methods have achieved the appropriate segmentation of individual objects. Additionally, most of the previous methods regularize the latent vectors using a Variational Autoencoder (VAE). Therefore, it is not clear whether VAE regularization contributes to appropriate object segmentation. To elucidate the mechanism of object segmentation in multi-object representation learning, we conducted an ablation study on MONet, which is a typical method. MONet represents multiple objects using pairs that consist of an attention mask and the latent vector corresponding to the attention mask. Each latent vector is encoded from the input image and attention mask. Then, the component image and attention mask are decoded from each latent vector. The loss function of MONet consists of 1) the sum of reconstruction losses between the input image and decoded component image, 2) the VAE regularization loss of the latent vector, and 3) the reconstruction loss of the attention mask to explicitly encode shape information. We conducted an ablation study on these three loss functions to investigate the effect on segmentation performance. Our results showed that the VAE regularization loss did not affect segmentation performance and the others losses did affect it. Based on this result, we hypothesize that it is important to maximize the attention mask of the image region best represented by a single latent vector corresponding to the attention mask. We confirmed this hypothesis by evaluating a new loss function with the same mechanism as the hypothesis.
    FedJETs: Efficient Just-In-Time Personalization with Federated Mixture of Experts. (arXiv:2306.08586v2 [cs.LG] UPDATED)
    One of the goals in Federated Learning (FL) is to create personalized models that can adapt to the context of each participating client, while utilizing knowledge from a shared global model. Yet, often, personalization requires a fine-tuning step using clients' labeled data in order to achieve good performance. This may not be feasible in scenarios where incoming clients are fresh and/or have privacy concerns. It, then, remains open how one can achieve just-in-time personalization in these scenarios. We propose FedJETs, a novel solution by using a Mixture-of-Experts (MoE) framework within a FL setup. Our method leverages the diversity of the clients to train specialized experts on different subsets of classes, and a gating function to route the input to the most relevant expert(s). Our gating function harnesses the knowledge of a pretrained model common expert to enhance its routing decisions on-the-fly. As a highlight, our approach can improve accuracy up to 18\% in state of the art FL settings, while maintaining competitive zero-shot performance. In practice, our method can handle non-homogeneous data distributions, scale more efficiently, and improve the state-of-the-art performance on common FL benchmarks.
    BioBridge: Bridging Biomedical Foundation Models via Knowledge Graph. (arXiv:2310.03320v1 [cs.LG])
    Foundation models (FMs) are able to leverage large volumes of unlabeled data to demonstrate superior performance across a wide range of tasks. However, FMs developed for biomedical domains have largely remained unimodal, i.e., independently trained and used for tasks on protein sequences alone, small molecule structures alone, or clinical data alone. To overcome this limitation of biomedical FMs, we present BioBridge, a novel parameter-efficient learning framework, to bridge independently trained unimodal FMs to establish multimodal behavior. BioBridge achieves it by utilizing Knowledge Graphs (KG) to learn transformations between one unimodal FM and another without fine-tuning any underlying unimodal FMs. Our empirical results demonstrate that BioBridge can beat the best baseline KG embedding methods (on average by around 76.3%) in cross-modal retrieval tasks. We also identify BioBridge demonstrates out-of-domain generalization ability by extrapolating to unseen modalities or relations. Additionally, we also show that BioBridge presents itself as a general purpose retriever that can aid biomedical multimodal question answering as well as enhance the guided generation of novel drugs.
    Targeted Adversarial Attacks on Generalizable Neural Radiance Fields. (arXiv:2310.03578v1 [cs.LG])
    Neural Radiance Fields (NeRFs) have recently emerged as a powerful tool for 3D scene representation and rendering. These data-driven models can learn to synthesize high-quality images from sparse 2D observations, enabling realistic and interactive scene reconstructions. However, the growing usage of NeRFs in critical applications such as augmented reality, robotics, and virtual environments could be threatened by adversarial attacks. In this paper we present how generalizable NeRFs can be attacked by both low-intensity adversarial attacks and adversarial patches, where the later could be robust enough to be used in real world applications. We also demonstrate targeted attacks, where a specific, predefined output scene is generated by these attack with success.
    Self-supervised Deep Unrolled Reconstruction Using Regularization by Denoising. (arXiv:2205.03519v3 [eess.IV] UPDATED)
    Deep learning methods have been successfully used in various computer vision tasks. Inspired by that success, deep learning has been explored in magnetic resonance imaging (MRI) reconstruction. In particular, integrating deep learning and model-based optimization methods has shown considerable advantages. However, a large amount of labeled training data is typically needed for high reconstruction quality, which is challenging for some MRI applications. In this paper, we propose a novel reconstruction method, named DURED-Net, that enables interpretable self-supervised learning for MR image reconstruction by combining a self-supervised denoising network and a plug-and-play method. We aim to boost the reconstruction performance of Noise2Noise in MR reconstruction by adding an explicit prior that utilizes imaging physics. Specifically, the leverage of a denoising network for MRI reconstruction is achieved using Regularization by Denoising (RED). Experiment results demonstrate that the proposed method requires a reduced amount of training data to achieve high reconstruction quality among the state-of-art of MR reconstruction utilizing the Noise2Noise method.
    EAG-RS: A Novel Explainability-guided ROI-Selection Framework for ASD Diagnosis via Inter-regional Relation Learning. (arXiv:2310.03404v1 [cs.LG])
    Deep learning models based on resting-state functional magnetic resonance imaging (rs-fMRI) have been widely used to diagnose brain diseases, particularly autism spectrum disorder (ASD). Existing studies have leveraged the functional connectivity (FC) of rs-fMRI, achieving notable classification performance. However, they have significant limitations, including the lack of adequate information while using linear low-order FC as inputs to the model, not considering individual characteristics (i.e., different symptoms or varying stages of severity) among patients with ASD, and the non-explainability of the decision process. To cover these limitations, we propose a novel explainability-guided region of interest (ROI) selection (EAG-RS) framework that identifies non-linear high-order functional associations among brain regions by leveraging an explainable artificial intelligence technique and selects class-discriminative regions for brain disease identification. The proposed framework includes three steps: (i) inter-regional relation learning to estimate non-linear relations through random seed-based network masking, (ii) explainable connection-wise relevance score estimation to explore high-order relations between functional connections, and (iii) non-linear high-order FC-based diagnosis-informative ROI selection and classifier learning to identify ASD. We validated the effectiveness of our proposed method by conducting experiments using the Autism Brain Imaging Database Exchange (ABIDE) dataset, demonstrating that the proposed method outperforms other comparative methods in terms of various evaluation metrics. Furthermore, we qualitatively analyzed the selected ROIs and identified ASD subtypes linked to previous neuroscientific studies.
    On the Implicit Bias of Adam. (arXiv:2309.00079v3 [cs.LG] UPDATED)
    In previous literature, backward error analysis was used to find ordinary differential equations (ODEs) approximating the gradient descent trajectory. It was found that finite step sizes implicitly regularize solutions because terms appearing in the ODEs penalize the two-norm of the loss gradients. We prove that the existence of similar implicit regularization in RMSProp and Adam depends on their hyperparameters and the training stage, but with a different "norm" involved: the corresponding ODE terms either penalize the (perturbed) one-norm of the loss gradients or, on the contrary, hinder its decrease (the latter case being typical). We also conduct numerical experiments and discuss how the proven facts can influence generalization.
    SFUSNet: A Spatial-Frequency domain-based Multi-branch Network for diagnosis of Cervical Lymph Node Lesions in Ultrasound Images. (arXiv:2308.16738v2 [eess.IV] UPDATED)
    Booming deep learning has substantially improved the diagnosis for diverse lesions in ultrasound images, but a conspicuous research gap concerning cervical lymph node lesions still remains. The objective of this work is to diagnose cervical lymph node lesions in ultrasound images by leveraging a deep learning model. To this end, we first collected 3392 cervical ultrasound images containing normal lymph nodes, benign lymph node lesions, malignant primary lymph node lesions, and malignant metastatic lymph node lesions. Given that ultrasound images are generated by the reflection and scattering of sound waves across varied bodily tissues, we proposed the Conv-FFT Block. It integrates convolutional operations with the fast Fourier transform to more astutely model the images. Building upon this foundation, we designed a novel architecture, named SFUSNet. SFUSNet not only discerns variances in ultrasound images from the spatial domain but also adeptly captures micro-structural alterations across various lesions in the frequency domain. To ascertain the potential of SFUSNet, we benchmarked it against 12 popular architectures through five-fold cross-validation. The results show that SFUSNet is the state-of-the-art model and can achieve 92.89% accuracy. Moreover, its average precision, average sensitivity and average specificity for four types of lesions achieve 90.46%, 89.95% and 97.49%, respectively.
    Agent Instructs Large Language Models to be General Zero-Shot Reasoners. (arXiv:2310.03710v1 [cs.CL])
    We introduce a method to improve the zero-shot reasoning abilities of large language models on general language understanding tasks. Specifically, we build an autonomous agent to instruct the reasoning process of large language models. We show this approach further unleashes the zero-shot reasoning abilities of large language models to more tasks. We study the performance of our method on a wide set of datasets spanning generation, classification, and reasoning. We show that our method generalizes to most tasks and obtains state-of-the-art zero-shot performance on 20 of the 29 datasets that we evaluate. For instance, our method boosts the performance of state-of-the-art large language models by a large margin, including Vicuna-13b (13.3%), Llama-2-70b-chat (23.2%), and GPT-3.5 Turbo (17.0%). Compared to zero-shot chain of thought, our improvement in reasoning is striking, with an average increase of 10.5%. With our method, Llama-2-70b-chat outperforms zero-shot GPT-3.5 Turbo by 10.2%.
    DyVal: Graph-informed Dynamic Evaluation of Large Language Models. (arXiv:2309.17167v2 [cs.AI] UPDATED)
    Large language models (LLMs) have achieved remarkable performance in various evaluation benchmarks. However, concerns about their performance are raised on potential data contamination in their considerable volume of training corpus. Moreover, the static nature and fixed complexity of current benchmarks may inadequately gauge the advancing capabilities of LLMs. In this paper, we introduce DyVal, a novel, general, and flexible evaluation protocol for dynamic evaluation of LLMs. Based on our proposed dynamic evaluation framework, we build graph-informed DyVal by leveraging the structural advantage of directed acyclic graphs to dynamically generate evaluation samples with controllable complexities. DyVal generates challenging evaluation sets on reasoning tasks including mathematics, logical reasoning, and algorithm problems. We evaluate various LLMs ranging from Flan-T5-large to ChatGPT and GPT4. Experiments demonstrate that LLMs perform worse in DyVal-generated evaluation samples with different complexities, emphasizing the significance of dynamic evaluation. We also analyze the failure cases and results of different prompting methods. Moreover, DyVal-generated samples are not only evaluation sets, but also helpful data for fine-tuning to improve the performance of LLMs on existing benchmarks. We hope that DyVal can shed light on the future evaluation research of LLMs.
    Quantitative CLTs in Deep Neural Networks. (arXiv:2307.06092v4 [cs.LG] UPDATED)
    We study the distribution of a fully connected neural network with random Gaussian weights and biases in which the hidden layer widths are proportional to a large constant $n$. Under mild assumptions on the non-linearity, we obtain quantitative bounds on normal approximations valid at large but finite $n$ and any fixed network depth. Our theorems show both for the finite-dimensional distributions and the entire process, that the distance between a random fully connected network (and its derivatives) to the corresponding infinite width Gaussian process scales like $n^{-\gamma}$ for $\gamma>0$, with the exponent depending on the metric used to measure discrepancy. Our bounds are strictly stronger in terms of their dependence on network width than any previously available in the literature; in the one-dimensional case, we also prove that they are optimal, i.e., we establish matching lower bounds.
    DISCO-10M: A Large-Scale Music Dataset. (arXiv:2306.13512v2 [cs.SD] UPDATED)
    Music datasets play a crucial role in advancing research in machine learning for music. However, existing music datasets suffer from limited size, accessibility, and lack of audio resources. To address these shortcomings, we present DISCO-10M, a novel and extensive music dataset that surpasses the largest previously available music dataset by an order of magnitude. To ensure high-quality data, we implement a multi-stage filtering process. This process incorporates similarities based on textual descriptions and audio embeddings. Moreover, we provide precomputed CLAP embeddings alongside DISCO-10M, facilitating direct application on various downstream tasks. These embeddings enable efficient exploration of machine learning applications on the provided data. With DISCO-10M, we aim to democratize and facilitate new research to help advance the development of novel machine learning models for music.
    SelfCheck: Using LLMs to Zero-Shot Check Their Own Step-by-Step Reasoning. (arXiv:2308.00436v3 [cs.AI] UPDATED)
    The recent progress in large language models (LLMs), especially the invention of chain-of-thought prompting, has made it possible to automatically answer questions by stepwise reasoning. However, when faced with more complicated problems that require non-linear thinking, even the strongest LLMs make mistakes. To address this, we explore whether LLMs are able to recognize errors in their own step-by-step reasoning, without resorting to external resources. To this end, we propose SelfCheck, a general-purpose zero-shot verification schema for recognizing such errors. We then use the results of these checks to improve question-answering performance by conducting weighted voting on multiple solutions to the question. We test SelfCheck on three datasets (GSM8K, MathQA, and MATH) and find that it successfully recognizes errors and, in turn, increases final answer accuracies.
    Solving a Class of Non-Convex Minimax Optimization in Federated Learning. (arXiv:2310.03613v1 [cs.LG])
    The minimax problems arise throughout machine learning applications, ranging from adversarial training and policy evaluation in reinforcement learning to AUROC maximization. To address the large-scale data challenges across multiple clients with communication-efficient distributed training, federated learning (FL) is gaining popularity. Many optimization algorithms for minimax problems have been developed in the centralized setting (\emph{i.e.} single-machine). Nonetheless, the algorithm for minimax problems under FL is still underexplored. In this paper, we study a class of federated nonconvex minimax optimization problems. We propose FL algorithms (FedSGDA+ and FedSGDA-M) and reduce existing complexity results for the most common minimax problems. For nonconvex-concave problems, we propose FedSGDA+ and reduce the communication complexity to $O(\varepsilon^{-6})$. Under nonconvex-strongly-concave and nonconvex-PL minimax settings, we prove that FedSGDA-M has the best-known sample complexity of $O(\kappa^{3} N^{-1}\varepsilon^{-3})$ and the best-known communication complexity of $O(\kappa^{2}\varepsilon^{-2})$. FedSGDA-M is the first algorithm to match the best sample complexity $O(\varepsilon^{-3})$ achieved by the single-machine method under the nonconvex-strongly-concave setting. Extensive experimental results on fair classification and AUROC maximization show the efficiency of our algorithms.
    Multimarginal generative modeling with stochastic interpolants. (arXiv:2310.03695v1 [cs.LG])
    Given a set of $K$ probability densities, we consider the multimarginal generative modeling problem of learning a joint distribution that recovers these densities as marginals. The structure of this joint distribution should identify multi-way correspondences among the prescribed marginals. We formalize an approach to this task within a generalization of the stochastic interpolant framework, leading to efficient learning algorithms built upon dynamical transport of measure. Our generative models are defined by velocity and score fields that can be characterized as the minimizers of simple quadratic objectives, and they are defined on a simplex that generalizes the time variable in the usual dynamical transport framework. The resulting transport on the simplex is influenced by all marginals, and we show that multi-way correspondences can be extracted. The identification of such correspondences has applications to style transfer, algorithmic fairness, and data decorruption. In addition, the multimarginal perspective enables an efficient algorithm for reducing the dynamical transport cost in the ordinary two-marginal setting. We demonstrate these capacities with several numerical examples.
    Landscape-Sketch-Step: An AI/ML-Based Metaheuristic for Surrogate Optimization Problems. (arXiv:2309.07936v3 [cs.LG] UPDATED)
    In this paper, we introduce a new heuristics for global optimization in scenarios where extensive evaluations of the cost function are expensive, inaccessible, or even prohibitive. The method, which we call Landscape-Sketch-and-Step (LSS), combines Machine Learning, Stochastic Optimization, and Reinforcement Learning techniques, relying on historical information from previously sampled points to make judicious choices of parameter values where the cost function should be evaluated at. Unlike optimization by Replica Exchange Monte Carlo methods, the number of evaluations of the cost function required in this approach is comparable to that used by Simulated Annealing, quality that is especially important in contexts like high-throughput computing or high-performance computing tasks, where evaluations are either computationally expensive or take a long time to be performed. The method also differs from standard Surrogate Optimization techniques, for it does not construct a surrogate model that aims at approximating or reconstructing the objective function. We illustrate our method by applying it to low dimensional optimization problems (dimensions 1, 2, 4, and 8) that mimick known difficulties of minimization on rugged energy landscapes often seen in Condensed Matter Physics, where cost functions are rugged and plagued with local minima. When compared to classical Simulated Annealing, the LSS shows an effective acceleration of the optimization process.
    Large-scale investigation of weakly-supervised deep learning for the fine-grained semantic indexing of biomedical literature. (arXiv:2301.09350v2 [cs.CL] UPDATED)
    Objective: Semantic indexing of biomedical literature is usually done at the level of MeSH descriptors with several related but distinct biomedical concepts often grouped together and treated as a single topic. This study proposes a new method for the automated refinement of subject annotations at the level of MeSH concepts. Methods: Lacking labelled data, we rely on weak supervision based on concept occurrence in the abstract of an article, which is also enhanced by dictionary-based heuristics. In addition, we investigate deep learning approaches, making design choices to tackle the particular challenges of this task. The new method is evaluated on a large-scale retrospective scenario, based on concepts that have been promoted to descriptors. Results: In our experiments concept occurrence was the strongest heuristic achieving a macro-F1 score of about 0.63 across several labels. The proposed method improved it further by more than 4pp. Conclusion: The results suggest that concept occurrence is a strong heuristic for refining the coarse-grained labels at the level of MeSH concepts and the proposed method improves it further.
    Deep Quantum Graph Dreaming: Deciphering Neural Network Insights into Quantum Experiments. (arXiv:2309.07056v2 [quant-ph] UPDATED)
    Despite their promise to facilitate new scientific discoveries, the opaqueness of neural networks presents a challenge in interpreting the logic behind their findings. Here, we use a eXplainable-AI (XAI) technique called $inception$ or $deep$ $dreaming$, which has been invented in machine learning for computer vision. We use this technique to explore what neural networks learn about quantum optics experiments. Our story begins by training deep neural networks on the properties of quantum systems. Once trained, we "invert" the neural network -- effectively asking how it imagines a quantum system with a specific property, and how it would continuously modify the quantum system to change a property. We find that the network can shift the initial distribution of properties of the quantum system, and we can conceptualize the learned strategies of the neural network. Interestingly, we find that, in the first layers, the neural network identifies simple properties, while in the deeper ones, it can identify complex quantum structures and even quantum entanglement. This is in reminiscence of long-understood properties known in computer vision, which we now identify in a complex natural science task. Our approach could be useful in a more interpretable way to develop new advanced AI-based scientific discovery techniques in quantum physics.
    Marginalized Importance Sampling for Off-Environment Policy Evaluation. (arXiv:2309.01807v2 [cs.LG] UPDATED)
    Reinforcement Learning (RL) methods are typically sample-inefficient, making it challenging to train and deploy RL-policies in real world robots. Even a robust policy trained in simulation requires a real-world deployment to assess their performance. This paper proposes a new approach to evaluate the real-world performance of agent policies prior to deploying them in the real world. Our approach incorporates a simulator along with real-world offline data to evaluate the performance of any policy using the framework of Marginalized Importance Sampling (MIS). Existing MIS methods face two challenges: (1) large density ratios that deviate from a reasonable range and (2) indirect supervision, where the ratio needs to be inferred indirectly, thus exacerbating estimation error. Our approach addresses these challenges by introducing the target policy's occupancy in the simulator as an intermediate variable and learning the density ratio as the product of two terms that can be learned separately. The first term is learned with direct supervision and the second term has a small magnitude, thus making it computationally efficient. We analyze the sample complexity as well as error propagation of our two step-procedure. Furthermore, we empirically evaluate our approach on Sim2Sim environments such as Cartpole, Reacher, and Half-Cheetah. Our results show that our method generalizes well across a variety of Sim2Sim gap, target policies and offline data collection policies. We also demonstrate the performance of our algorithm on a Sim2Real task of validating the performance of a 7 DoF robotic arm using offline data along with the Gazebo simulator.
    IBCL: Zero-shot Model Generation for Task Trade-offs in Continual Learning. (arXiv:2310.02995v2 [cs.LG] UPDATED)
    Like generic multi-task learning, continual learning has the nature of multi-objective optimization, and therefore faces a trade-off between the performance of different tasks. That is, to optimize for the current task distribution, it may need to compromise performance on some previous tasks. This means that there exist multiple models that are Pareto-optimal at different times, each addressing a distinct task performance trade-off. Researchers have discussed how to train particular models to address specific trade-off preferences. However, existing algorithms require training overheads proportional to the number of preferences -- a large burden when there are multiple, possibly infinitely many, preferences. As a response, we propose Imprecise Bayesian Continual Learning (IBCL). Upon a new task, IBCL (1) updates a knowledge base in the form of a convex hull of model parameter distributions and (2) obtains particular models to address task trade-off preferences with zero-shot. That is, IBCL does not require any additional training overhead to generate preference-addressing models from its knowledge base. We show that models obtained by IBCL have guarantees in identifying the Pareto optimal parameters. Moreover, experiments on standard image classification and NLP tasks support this guarantee. Statistically, IBCL improves average per-task accuracy by at most 23\% and peak per-task accuracy by at most 15\% with respect to the baseline methods, with steadily near-zero or positive backward transfer. Most importantly, IBCL significantly reduces the training overhead from training 1 model per preference to at most 3 models for all preferences.
    Borges and AI. (arXiv:2310.01425v2 [cs.CL] UPDATED)
    Many believe that Large Language Models (LLMs) open the era of Artificial Intelligence (AI). Some see opportunities while others see dangers. Yet both proponents and opponents grasp AI through the imagery popularised by science fiction. Will the machine become sentient and rebel against its creators? Will we experience a paperclip apocalypse? Before answering such questions, we should first ask whether this mental imagery provides a good description of the phenomenon at hand. Understanding weather patterns through the moods of the gods only goes so far. The present paper instead advocates understanding LLMs and their connection to AI through the imagery of Jorge Luis Borges, a master of 20th century literature, forerunner of magical realism, and precursor to postmodern literature. This exercise leads to a new perspective that illuminates the relation between language modelling and artificial intelligence.
    BaDExpert: Extracting Backdoor Functionality for Accurate Backdoor Input Detection. (arXiv:2308.12439v2 [cs.CR] UPDATED)
    We present a novel defense, against backdoor attacks on Deep Neural Networks (DNNs), wherein adversaries covertly implant malicious behaviors (backdoors) into DNNs. Our defense falls within the category of post-development defenses that operate independently of how the model was generated. The proposed defense is built upon a novel reverse engineering approach that can directly extract backdoor functionality of a given backdoored model to a backdoor expert model. The approach is straightforward -- finetuning the backdoored model over a small set of intentionally mislabeled clean samples, such that it unlearns the normal functionality while still preserving the backdoor functionality, and thus resulting in a model (dubbed a backdoor expert model) that can only recognize backdoor inputs. Based on the extracted backdoor expert model, we show the feasibility of devising highly accurate backdoor input detectors that filter out the backdoor inputs during model inference. Further augmented by an ensemble strategy with a finetuned auxiliary model, our defense, BaDExpert (Backdoor Input Detection with Backdoor Expert), effectively mitigates 17 SOTA backdoor attacks while minimally impacting clean utility. The effectiveness of BaDExpert has been verified on multiple datasets (CIFAR10, GTSRB and ImageNet) across various model architectures (ResNet, VGG, MobileNetV2 and Vision Transformer).
    Formally Explaining Neural Networks within Reactive Systems. (arXiv:2308.00143v3 [cs.AI] UPDATED)
    Deep neural networks (DNNs) are increasingly being used as controllers in reactive systems. However, DNNs are highly opaque, which renders it difficult to explain and justify their actions. To mitigate this issue, there has been a surge of interest in explainable AI (XAI) techniques, capable of pinpointing the input features that caused the DNN to act as it did. Existing XAI techniques typically face two limitations: (i) they are heuristic, and do not provide formal guarantees that the explanations are correct; and (ii) they often apply to ``one-shot'' systems, where the DNN is invoked independently of past invocations, as opposed to reactive systems. Here, we begin bridging this gap, and propose a formal DNN-verification-based XAI technique for reasoning about multi-step, reactive systems. We suggest methods for efficiently calculating succinct explanations, by exploiting the system's transition constraints in order to curtail the search space explored by the underlying verifier. We evaluate our approach on two popular benchmarks from the domain of automated navigation; and observe that our methods allow the efficient computation of minimal and minimum explanations, significantly outperforming the state of the art. We also demonstrate that our methods produce formal explanations that are more reliable than competing, non-verification-based XAI techniques.
    Probabilistically Rewired Message-Passing Neural Networks. (arXiv:2310.02156v2 [cs.LG] UPDATED)
    Message-passing graph neural networks (MPNNs) emerged as powerful tools for processing graph-structured input. However, they operate on a fixed input graph structure, ignoring potential noise and missing information. Furthermore, their local aggregation mechanism can lead to problems such as over-squashing and limited expressive power in capturing relevant graph structures. Existing solutions to these challenges have primarily relied on heuristic methods, often disregarding the underlying data distribution. Hence, devising principled approaches for learning to infer graph structures relevant to the given prediction task remains an open challenge. In this work, leveraging recent progress in exact and differentiable $k$-subset sampling, we devise probabilistically rewired MPNNs (PR-MPNNs), which learn to add relevant edges while omitting less beneficial ones. For the first time, our theoretical analysis explores how PR-MPNNs enhance expressive power, and we identify precise conditions under which they outperform purely randomized approaches. Empirically, we demonstrate that our approach effectively mitigates issues like over-squashing and under-reaching. In addition, on established real-world datasets, our method exhibits competitive or superior predictive performance compared to traditional MPNN models and recent graph transformer architectures.
    Transferring Annotator- and Instance-dependent Transition Matrix for Learning from Crowds. (arXiv:2306.03116v2 [cs.HC] UPDATED)
    Learning from crowds describes that the annotations of training data are obtained with crowd-sourcing services. Multiple annotators each complete their own small part of the annotations, where labeling mistakes that depend on annotators occur frequently. Modeling the label-noise generation process by the noise transition matrix is a power tool to tackle the label noise. In real-world crowd-sourcing scenarios, noise transition matrices are both annotator- and instance-dependent. However, due to the high complexity of annotator- and instance-dependent transition matrices (AIDTM), annotation sparsity, which means each annotator only labels a little part of instances, makes modeling AIDTM very challenging. Prior works simplify the problem by assuming the transition matrix is instance-independent or using simple parametric ways, which lose modeling generality. Motivated by this, we target a more realistic problem, estimating general AIDTM in practice. Without losing modeling generality, we parameterize AIDTM with deep neural networks. To alleviate the modeling challenge, we suppose every annotator shares its noise pattern with similar annotators, and estimate AIDTM via knowledge transfer. We hence first model the mixture of noise patterns by all annotators, and then transfer this modeling to individual annotators. Furthermore, considering that the transfer from the mixture of noise patterns to individuals may cause two annotators with highly different noise generations to perturb each other, we employ the knowledge transfer between identified neighboring annotators to calibrate the modeling. Theoretical analyses are derived to demonstrate that both the knowledge transfer from global to individuals and the knowledge transfer between neighboring individuals can help model general AIDTM. Experiments confirm the superiority of the proposed approach on synthetic and real-world crowd-sourcing data.
    Burning the Adversarial Bridges: Robust Windows Malware Detection Against Binary-level Mutations. (arXiv:2310.03285v1 [cs.LG])
    Toward robust malware detection, we explore the attack surface of existing malware detection systems. We conduct root-cause analyses of the practical binary-level black-box adversarial malware examples. Additionally, we uncover the sensitivity of volatile features within the detection engines and exhibit their exploitability. Highlighting volatile information channels within the software, we introduce three software pre-processing steps to eliminate the attack surface, namely, padding removal, software stripping, and inter-section information resetting. Further, to counter the emerging section injection attacks, we propose a graph-based section-dependent information extraction scheme for software representation. The proposed scheme leverages aggregated information within various sections in the software to enable robust malware detection and mitigate adversarial settings. Our experimental results show that traditional malware detection models are ineffective against adversarial threats. However, the attack surface can be largely reduced by eliminating the volatile information. Therefore, we propose simple-yet-effective methods to mitigate the impacts of binary manipulation attacks. Overall, our graph-based malware detection scheme can accurately detect malware with an area under the curve score of 88.32\% and a score of 88.19% under a combination of binary manipulation attacks, exhibiting the efficiency of our proposed scheme.
    Multiple Case Physics-Informed Neural Network for Biomedical Tube Flows. (arXiv:2309.15294v2 [physics.flu-dyn] UPDATED)
    Fluid dynamics computations for tube-like geometries are important for biomedical evaluation of vascular and airway fluid dynamics. Physics-Informed Neural Networks (PINNs) have recently emerged as a good alternative to traditional computational fluid dynamics (CFD) methods. The vanilla PINN, however, requires much longer training time than the traditional CFD methods for each specific flow scenario and thus does not justify its mainstream use. Here, we explore the use of the multi-case PINN approach for calculating biomedical tube flows, where varied geometry cases are parameterized and pre-trained on the PINN, such that results for unseen geometries can be obtained in real time. Our objective is to identify network architecture, tube-specific, and regularization strategies that can optimize this, via experiments on a series of idealized 2D stenotic tube flows.
    Rayleigh Quotient Graph Neural Networks for Graph-level Anomaly Detection. (arXiv:2310.02861v2 [cs.LG] UPDATED)
    Graph-level anomaly detection has gained significant attention as it finds many applications in various domains, such as cancer diagnosis and enzyme prediction. However, existing methods fail to capture the underlying properties of graph anomalies, resulting in unexplainable framework design and unsatisfying performance. In this paper, we take a step back and re-investigate the spectral differences between anomalous and normal graphs. Our main observation shows a significant disparity in the accumulated spectral energy between these two classes. Moreover, we prove that the accumulated spectral energy of the graph signal can be represented by its Rayleigh Quotient, indicating that the Rayleigh Quotient is a driving factor behind the anomalous properties of graphs. Motivated by this, we propose Rayleigh Quotient Graph Neural Network (RQGNN), the first spectral GNN for graph-level anomaly detection, providing a new perspective on exploring the inherent spectral features of anomalous graphs. Specifically, we introduce a novel framework that consists of two components: the Rayleigh Quotient learning component (RQL) and Chebyshev Wavelet GNN with RQ-pooling (CWGNN-RQ). RQL explicitly captures the Rayleigh Quotient of graphs and CWGNN-RQ implicitly explores the spectral space of graphs. Extensive experiments on 10 real-world datasets show that RQGNN outperforms the best rival by 6.74% in Macro-F1 score and 1.44% in AUC, demonstrating the effectiveness of our framework.
    PINNacle: A Comprehensive Benchmark of Physics-Informed Neural Networks for Solving PDEs. (arXiv:2306.08827v2 [cs.LG] UPDATED)
    While significant progress has been made on Physics-Informed Neural Networks (PINNs), a comprehensive comparison of these methods across a wide range of Partial Differential Equations (PDEs) is still lacking. This study introduces PINNacle, a benchmarking tool designed to fill this gap. PINNacle provides a diverse dataset, comprising over 20 distinct PDEs from various domains, including heat conduction, fluid dynamics, biology, and electromagnetics. These PDEs encapsulate key challenges inherent to real-world problems, such as complex geometry, multi-scale phenomena, nonlinearity, and high dimensionality. PINNacle also offers a user-friendly toolbox, incorporating about 10 state-of-the-art PINN methods for systematic evaluation and comparison. We have conducted extensive experiments with these methods, offering insights into their strengths and weaknesses. In addition to providing a standardized means of assessing performance, PINNacle also offers an in-depth analysis to guide future research, particularly in areas such as domain decomposition methods and loss reweighting for handling multi-scale problems and complex geometry. To the best of our knowledge, it is the largest benchmark with a diverse and comprehensive evaluation that will undoubtedly foster further research in PINNs.
    Reconstructing Existing Levels through Level Inpainting. (arXiv:2309.09472v3 [cs.CV] UPDATED)
    Procedural Content Generation (PCG) and Procedural Content Generation via Machine Learning (PCGML) have been used in prior work for generating levels in various games. This paper introduces Content Augmentation and focuses on the subproblem of level inpainting, which involves reconstructing and extending video game levels. Drawing inspiration from image inpainting, we adapt two techniques from this domain to address our specific use case. We present two approaches for level inpainting: an Autoencoder and a U-net. Through a comprehensive case study, we demonstrate their superior performance compared to a baseline method and discuss their relative merits. Furthermore, we provide a practical demonstration of both approaches for the level inpainting task and offer insights into potential directions for future research.
    Sampling via Gradient Flows in the Space of Probability Measures. (arXiv:2310.03597v1 [stat.ML])
    Sampling a target probability distribution with an unknown normalization constant is a fundamental challenge in computational science and engineering. Recent work shows that algorithms derived by considering gradient flows in the space of probability measures open up new avenues for algorithm development. This paper makes three contributions to this sampling approach by scrutinizing the design components of such gradient flows. Any instantiation of a gradient flow for sampling needs an energy functional and a metric to determine the flow, as well as numerical approximations of the flow to derive algorithms. Our first contribution is to show that the Kullback-Leibler divergence, as an energy functional, has the unique property (among all f-divergences) that gradient flows resulting from it do not depend on the normalization constant of the target distribution. Our second contribution is to study the choice of metric from the perspective of invariance. The Fisher-Rao metric is known as the unique choice (up to scaling) that is diffeomorphism invariant. As a computationally tractable alternative, we introduce a relaxed, affine invariance property for the metrics and gradient flows. In particular, we construct various affine invariant Wasserstein and Stein gradient flows. Affine invariant gradient flows are shown to behave more favorably than their non-affine-invariant counterparts when sampling highly anisotropic distributions, in theory and by using particle methods. Our third contribution is to study, and develop efficient algorithms based on Gaussian approximations of the gradient flows; this leads to an alternative to particle methods. We establish connections between various Gaussian approximate gradient flows, discuss their relation to gradient methods arising from parametric variational inference, and study their convergence properties both theoretically and numerically.
    Generative models for two-ground-truth partitions in networks. (arXiv:2302.02787v3 [cs.SI] UPDATED)
    A myriad of approaches have been proposed to characterise the mesoscale structure of networks - most often as a partition based on patterns variously called communities, blocks, or clusters. Clearly, distinct methods designed to detect different types of patterns may provide a variety of answers to the network's mesoscale structure. Yet, even multiple runs of a given method can sometimes yield diverse and conflicting results, producing entire landscapes of partitions which potentially include multiple (locally optimal) mesoscale explanations of the network. Such ambiguity motivates a closer look at the ability of these methods to find multiple qualitatively different 'ground truth' partitions in a network. Here, we propose the stochastic cross-block model (SCBM), a generative model which allows for two distinct partitions to be built into the mesoscale structure of a single benchmark network. We demonstrate a use case of the benchmark model by appraising the power of stochastic block models (SBMs) to detect implicitly planted coexisting bi-community and core-periphery structures of different strengths. Given our model design and experimental set-up, we find that the ability to detect the two partitions individually varies by SBM variant and that coexistence of both partitions is recovered only in a very limited number of cases. Our findings suggest that in most instances only one - in some way dominating - structure can be detected, even in the presence of other partitions. They underline the need for considering entire landscapes of partitions when different competing explanations exist and motivate future research to advance partition coexistence detection methods. Our model also contributes to the field of benchmark networks more generally by enabling further exploration of the ability of new and existing methods to detect ambiguity in the mesoscale structure of networks.
    Numerical Weather Forecasting using Convolutional-LSTM with Attention and Context Matcher Mechanisms. (arXiv:2102.00696v2 [cs.LG] UPDATED)
    Numerical weather forecasting using high-resolution physical models often requires extensive computational resources on supercomputers, which diminishes their wide usage in most real-life applications. As a remedy, applying deep learning methods has revealed innovative solutions within this field. To this end, we introduce a novel deep learning architecture for forecasting high-resolution spatio-temporal weather data. Our approach extends the conventional encoder-decoder structure by integrating Convolutional Long-short Term Memory and Convolutional Neural Networks. In addition, we incorporate attention and context matcher mechanisms into the model architecture. Our Weather Model achieves significant performance improvements compared to baseline deep learning models, including ConvLSTM, TrajGRU, and U-Net. Our experimental evaluation involves high-scale, real-world benchmark numerical weather datasets, namely the ERA5 hourly dataset on pressure levels and WeatherBench. Our results demonstrate substantial improvements in identifying spatial and temporal correlations with attention matrices focusing on distinct parts of the input series to model atmospheric circulations. We also compare our model with high-resolution physical models using the benchmark metrics and show that our Weather Model is accurate and easy to interpret.
    Spatial-temporal associations representation and application for process monitoring using graph convolution neural network. (arXiv:2205.05250v2 [cs.LG] UPDATED)
    Thank you very much for the attention and concern of colleagues and scholars in this work. With the comments and guidance of experts, editors, and reviewers, this work has been accepted for publishing in the journal "Process Safety and Environmental Protection". The theme of this paper relies on the Spatial-temporal associations of numerous variables in the same industrial processes, which refers to numerous variables obtained in dynamic industrial processes with Spatial-temporal correlation characteristics, i.e., these variables are not only highly correlated in time but also interrelated in space. To handle this problem, three key issues need to be well addressed: variable characteristics modeling and representation, graph network construction (temporal information), and graph characteristics perception. The first issue is implemented by assuming the data follows one improved Gaussian distribution, while the graph network can be defined by the monitoring variables and their edges which are calculated by their characteristics in time. Finally, these networks corresponding to process states at different times are fed into a graph convolutional neural network to implement graph classification to achieve process monitoring. A benchmark experiment (Tennessee Eastman chemical process) and one application study (cobalt purification from zinc solution) are employed to demonstrate the feasibility and applicability of this paper.
    Disentangling the Link Between Image Statistics and Human Perception. (arXiv:2303.09874v3 [cs.CV] UPDATED)
    In the 1950s, Barlow and Attneave hypothesised a link between biological vision and information maximisation. Following Shannon, information was defined using the probability of natural images. A number of physiological and psychophysical phenomena have been derived ever since from principles like info-max, efficient coding, or optimal denoising. However, it remains unclear how this link is expressed in mathematical terms from image probability. First, classical derivations were subjected to strong assumptions on the probability models and on the behaviour of the sensors. Moreover, the direct evaluation of the hypothesis was limited by the inability of the classical image models to deliver accurate estimates of the probability. In this work we directly evaluate image probabilities using an advanced generative model for natural images, and we analyse how probability-related factors can be combined to predict human perception via sensitivity of state-of-the-art subjective image quality metrics. We use information theory and regression analysis to find a combination of just two probability-related factors that achieves 0.8 correlation with subjective metrics. This probability-based sensitivity is psychophysically validated by reproducing the basic trends of the Contrast Sensitivity Function, its suprathreshold variation, and trends of the Weber-law and masking.
    OMG-ATTACK: Self-Supervised On-Manifold Generation of Transferable Evasion Attacks. (arXiv:2310.03707v1 [cs.LG])
    Evasion Attacks (EA) are used to test the robustness of trained neural networks by distorting input data to misguide the model into incorrect classifications. Creating these attacks is a challenging task, especially with the ever-increasing complexity of models and datasets. In this work, we introduce a self-supervised, computationally economical method for generating adversarial examples, designed for the unseen black-box setting. Adapting techniques from representation learning, our method generates on-manifold EAs that are encouraged to resemble the data distribution. These attacks are comparable in effectiveness compared to the state-of-the-art when attacking the model trained on, but are significantly more effective when attacking unseen models, as the attacks are more related to the data rather than the model itself. Our experiments consistently demonstrate the method is effective across various models, unseen data categories, and even defended models, suggesting a significant role for on-manifold EAs when targeting unseen models.
    Efficient Biologically Plausible Adversarial Training. (arXiv:2309.17348v3 [cs.LG] UPDATED)
    Artificial Neural Networks (ANNs) trained with Backpropagation (BP) show astounding performance and are increasingly often used in performing our daily life tasks. However, ANNs are highly vulnerable to adversarial attacks, which alter inputs with small targeted perturbations that drastically disrupt the models' performance. The most effective method to make ANNs robust against these attacks is adversarial training, in which the training dataset is augmented with exemplary adversarial samples. Unfortunately, this approach has the drawback of increased training complexity since generating adversarial samples is very computationally demanding. In contrast to ANNs, humans are not susceptible to adversarial attacks. Therefore, in this work, we investigate whether biologically-plausible learning algorithms are more robust against adversarial attacks than BP. In particular, we present an extensive comparative analysis of the adversarial robustness of BP and Present the Error to Perturb the Input To modulate Activity (PEPITA), a recently proposed biologically-plausible learning algorithm, on various computer vision tasks. We observe that PEPITA has higher intrinsic adversarial robustness and, with adversarial training, has a more favourable natural-vs-adversarial performance trade-off as, for the same natural accuracies, PEPITA's adversarial accuracies decrease in average by 0.26% and BP's by 8.05%.
    LoRA ensembles for large language model fine-tuning. (arXiv:2310.00035v2 [cs.LG] UPDATED)
    Finetuned LLMs often exhibit poor uncertainty quantification, manifesting as overconfidence, poor calibration, and unreliable prediction results on test data or out-of-distribution samples. One approach commonly used in vision for alleviating this issue is a deep ensemble, which constructs an ensemble by training the same model multiple times using different random initializations. However, there is a huge challenge to ensembling LLMs: the most effective LLMs are very, very large. Keeping a single LLM in memory is already challenging enough: keeping an ensemble of e.g. 5 LLMs in memory is impossible in many settings. To address these issues, we propose an ensemble approach using Low-Rank Adapters (LoRA), a parameter-efficient fine-tuning technique. Critically, these low-rank adapters represent a very small number of parameters, orders of magnitude less than the underlying pre-trained model. Thus, it is possible to construct large ensembles of LoRA adapters with almost the same computational overhead as using the original model. We find that LoRA ensembles, applied on its own or on top of pre-existing regularization techniques, gives consistent improvements in predictive accuracy and uncertainty quantification.
    Learning Robust Statistics for Simulation-based Inference under Model Misspecification. (arXiv:2305.15871v3 [stat.ML] UPDATED)
    Simulation-based inference (SBI) methods such as approximate Bayesian computation (ABC), synthetic likelihood, and neural posterior estimation (NPE) rely on simulating statistics to infer parameters of intractable likelihood models. However, such methods are known to yield untrustworthy and misleading inference outcomes under model misspecification, thus hindering their widespread applicability. In this work, we propose the first general approach to handle model misspecification that works across different classes of SBI methods. Leveraging the fact that the choice of statistics determines the degree of misspecification in SBI, we introduce a regularized loss function that penalises those statistics that increase the mismatch between the data and the model. Taking NPE and ABC as use cases, we demonstrate the superior performance of our method on high-dimensional time-series models that are artificially misspecified. We also apply our method to real data from the field of radio propagation where the model is known to be misspecified. We show empirically that the method yields robust inference in misspecified scenarios, whilst still being accurate when the model is well-specified.
    Analysis of learning a flow-based generative model from limited sample complexity. (arXiv:2310.03575v1 [stat.ML])
    We study the problem of training a flow-based generative model, parametrized by a two-layer autoencoder, to sample from a high-dimensional Gaussian mixture. We provide a sharp end-to-end analysis of the problem. First, we provide a tight closed-form characterization of the learnt velocity field, when parametrized by a shallow denoising auto-encoder trained on a finite number $n$ of samples from the target distribution. Building on this analysis, we provide a sharp description of the corresponding generative flow, which pushes the base Gaussian density forward to an approximation of the target density. In particular, we provide closed-form formulae for the distance between the mean of the generated mixture and the mean of the target mixture, which we show decays as $\Theta_n(\frac{1}{n})$. Finally, this rate is shown to be in fact Bayes-optimal.
    A Comprehensive Survey of Dataset Distillation. (arXiv:2301.05603v3 [cs.LG] UPDATED)
    Deep learning technology has developed unprecedentedly in the last decade and has become the primary choice in many application domains. This progress is mainly attributed to a systematic collaboration in which rapidly growing computing resources encourage advanced algorithms to deal with massive data. However, it has gradually become challenging to handle the unlimited growth of data with limited computing power. To this end, diverse approaches are proposed to improve data processing efficiency. Dataset distillation, a dataset reduction method, addresses this problem by synthesizing a small typical dataset from substantial data and has attracted much attention from the deep learning community. Existing dataset distillation methods can be taxonomized into meta-learning and data matching frameworks according to whether they explicitly mimic the performance of target data. Although dataset distillation has shown surprising performance in compressing datasets, there are still several limitations such as distilling high-resolution data or data with complex label spaces. This paper provides a holistic understanding of dataset distillation from multiple aspects, including distillation frameworks and algorithms, factorized dataset distillation, performance comparison, and applications. Finally, we discuss challenges and promising directions to further promote future studies on dataset distillation.
    Adversarial Machine Learning for Social Good: Reframing the Adversary as an Ally. (arXiv:2310.03614v1 [cs.LG])
    Deep Neural Networks (DNNs) have been the driving force behind many of the recent advances in machine learning. However, research has shown that DNNs are vulnerable to adversarial examples -- input samples that have been perturbed to force DNN-based models to make errors. As a result, Adversarial Machine Learning (AdvML) has gained a lot of attention, and researchers have investigated these vulnerabilities in various settings and modalities. In addition, DNNs have also been found to incorporate embedded bias and often produce unexplainable predictions, which can result in anti-social AI applications. The emergence of new AI technologies that leverage Large Language Models (LLMs), such as ChatGPT and GPT-4, increases the risk of producing anti-social applications at scale. AdvML for Social Good (AdvML4G) is an emerging field that repurposes the AdvML bug to invent pro-social applications. Regulators, practitioners, and researchers should collaborate to encourage the development of pro-social applications and hinder the development of anti-social ones. In this work, we provide the first comprehensive review of the emerging field of AdvML4G. This paper encompasses a taxonomy that highlights the emergence of AdvML4G, a discussion of the differences and similarities between AdvML4G and AdvML, a taxonomy covering social good-related concepts and aspects, an exploration of the motivations behind the emergence of AdvML4G at the intersection of ML4G and AdvML, and an extensive summary of the works that utilize AdvML4G as an auxiliary tool for innovating pro-social applications. Finally, we elaborate upon various challenges and open research issues that require significant attention from the research community.
    Decoding speech perception from non-invasive brain recordings. (arXiv:2208.12266v2 [eess.AS] UPDATED)
    Decoding speech from brain activity is a long-awaited goal in both healthcare and neuroscience. Invasive devices have recently led to major milestones in that regard: deep learning algorithms trained on intracranial recordings now start to decode elementary linguistic features (e.g. letters, words, spectrograms). However, extending this approach to natural speech and non-invasive brain recordings remains a major challenge. Here, we introduce a model trained with contrastive-learning to decode self-supervised representations of perceived speech from the non-invasive recordings of a large cohort of healthy individuals. To evaluate this approach, we curate and integrate four public datasets, encompassing 175 volunteers recorded with magneto- or electro-encephalography (M/EEG), while they listened to short stories and isolated sentences. The results show that our model can identify, from 3 seconds of MEG signals, the corresponding speech segment with up to 41% accuracy out of more than 1,000 distinct possibilities on average across participants, and more than 80% in the very best participants - a performance that allows the decoding of words and phrases absent from the training set. The comparison of our model to a variety of baselines highlights the importance of (i) a contrastive objective, (ii) pretrained representations of speech and (iii) a common convolutional architecture simultaneously trained across multiple participants. Finally, the analysis of the decoder's predictions suggests that they primarily depend on lexical and contextual semantic representations. Overall, this effective decoding of perceived speech from non-invasive recordings delineates a promising path to decode language from brain activity, without putting patients at risk for brain surgery.
    Regression with Label Differential Privacy. (arXiv:2212.06074v3 [cs.LG] UPDATED)
    We study the task of training regression models with the guarantee of label differential privacy (DP). Based on a global prior distribution on label values, which could be obtained privately, we derive a label DP randomization mechanism that is optimal under a given regression loss function. We prove that the optimal mechanism takes the form of a "randomized response on bins", and propose an efficient algorithm for finding the optimal bin values. We carry out a thorough experimental evaluation on several datasets demonstrating the efficacy of our algorithm.
    Unsupervised Foreground Extraction via Deep Region Competition. (arXiv:2110.15497v4 [cs.CV] UPDATED)
    We present Deep Region Competition (DRC), an algorithm designed to extract foreground objects from images in a fully unsupervised manner. Foreground extraction can be viewed as a special case of generic image segmentation that focuses on identifying and disentangling objects from the background. In this work, we rethink the foreground extraction by reconciling energy-based prior with generative image modeling in the form of Mixture of Experts (MoE), where we further introduce the learned pixel re-assignment as the essential inductive bias to capture the regularities of background regions. With this modeling, the foreground-background partition can be naturally found through Expectation-Maximization (EM). We show that the proposed method effectively exploits the interaction between the mixture components during the partitioning process, which closely connects to region competition, a seminal approach for generic image segmentation. Experiments demonstrate that DRC exhibits more competitive performances on complex real-world data and challenging multi-object scenes compared with prior methods. Moreover, we show empirically that DRC can potentially generalize to novel foreground objects even from categories unseen during training.
    Algebraic and Geometric Models for Space Networking. (arXiv:2304.01150v2 [math.AT] UPDATED)
    In this paper we introduce some new algebraic and geometric perspectives on networked space communications. Our main contribution is a novel definition of a time-varying graph (TVG), defined in terms of a matrix with values in subsets of the real line P(R). We leverage semi-ring properties of P(R) to model multi-hop communication in a TVG using matrix multiplication and a truncated Kleene star. This leads to novel statistics on the communication capacity of TVGs called lifetime curves, which we generate for large samples of randomly chosen STARLINK satellites, whose connectivity is modeled over day-long simulations. Determining when a large subsample of STARLINK is temporally strongly connected is further analyzed using novel metrics introduced here that are inspired by topological data analysis (TDA). To better model networking scenarios between the Earth and Mars, we introduce various semi-rings capable of modeling propagation delay as well as protocols common to Delay Tolerant Networking (DTN), such as store-and-forward. Finally, we illustrate the applicability of zigzag persistence for featurizing different space networks and demonstrate the efficacy of K-Nearest Neighbors (KNN) classification for distinguishing Earth-Mars and Earth-Moon satellite systems using time-varying topology alone.
    Linking Across Data Granularity: Fitting Multivariate Hawkes Processes to Partially Interval-Censored Data. (arXiv:2111.02062v3 [cs.LG] UPDATED)
    The multivariate Hawkes process (MHP) is widely used for analyzing data streams that interact with each other, where events generate new events within their own dimension (via self-excitation) or across different dimensions (via cross-excitation). However, in certain applications, the timestamps of individual events in some dimensions are unobservable, and only event counts within intervals are known, referred to as partially interval-censored data. The MHP is unsuitable for handling such data since its estimation requires event timestamps. In this study, we introduce the Partial Mean Behavior Poisson (PMBP) process, a novel point process which shares parameter equivalence with the MHP and can effectively model both timestamped and interval-censored data. We demonstrate the capabilities of the PMBP process using synthetic and real-world datasets. Firstly, we illustrate that the PMBP process can approximate MHP parameters and recover the spectral radius using synthetic event histories. Next, we assess the performance of the PMBP process in predicting YouTube popularity and find that it surpasses state-of-the-art methods. Lastly, we leverage the PMBP process to gain qualitative insights from a dataset comprising daily COVID-19 case counts from multiple countries and COVID-19-related news articles. By clustering the PMBP-modeled countries, we unveil hidden interaction patterns between occurrences of COVID-19 cases and news reporting.
    Unpaired Image-to-Image Translation via Neural Schr\"odinger Bridge. (arXiv:2305.15086v2 [cs.CV] UPDATED)
    Diffusion models are a powerful class of generative models which simulate stochastic differential equations (SDEs) to generate data from noise. Although diffusion models have achieved remarkable progress in recent years, they have limitations in the unpaired image-to-image translation tasks due to the Gaussian prior assumption. Schr\"odinger Bridge (SB), which learns an SDE to translate between two arbitrary distributions, have risen as an attractive solution to this problem. However, none of SB models so far have been successful at unpaired translation between high-resolution images. In this work, we propose the Unpaired Neural Schr\"odinger Bridge (UNSB), which expresses SB problem as a sequence of adversarial learning problems. This allows us to incorporate advanced discriminators and regularization to learn a SB between unpaired data. We demonstrate that UNSB is scalable and successfully solves various unpaired image-to-image translation tasks. Code: \url{https://github.com/cyclomon/UNSB}
    Distribution-free risk assessment of regression-based machine learning algorithms. (arXiv:2310.03545v1 [cs.LG])
    Machine learning algorithms have grown in sophistication over the years and are increasingly deployed for real-life applications. However, when using machine learning techniques in practical settings, particularly in high-risk applications such as medicine and engineering, obtaining the failure probability of the predictive model is critical. We refer to this problem as the risk-assessment task. We focus on regression algorithms and the risk-assessment task of computing the probability of the true label lying inside an interval defined around the model's prediction. We solve the risk-assessment problem using the conformal prediction approach, which provides prediction intervals that are guaranteed to contain the true label with a given probability. Using this coverage property, we prove that our approximated failure probability is conservative in the sense that it is not lower than the true failure probability of the ML algorithm. We conduct extensive experiments to empirically study the accuracy of the proposed method for problems with and without covariate shift. Our analysis focuses on different modeling regimes, dataset sizes, and conformal prediction methodologies.
    Time-Varying Propensity Score to Bridge the Gap between the Past and Present. (arXiv:2210.01422v4 [cs.LG] UPDATED)
    Real-world deployment of machine learning models is challenging because data evolves over time. While no model can work when data evolves in an arbitrary fashion, if there is some pattern to these changes, we might be able to design methods to address it. This paper addresses situations when data evolves gradually. We introduce a time-varying propensity score that can detect gradual shifts in the distribution of data which allows us to selectively sample past data to update the model -- not just similar data from the past like that of a standard propensity score but also data that evolved in a similar fashion in the past. The time-varying propensity score is quite general: we demonstrate different ways of implementing it and evaluate it on a variety of problems ranging from supervised learning (e.g., image classification problems) where data undergoes a sequence of gradual shifts, to reinforcement learning tasks (e.g., robotic manipulation and continuous control) where data shifts as the policy or the task changes.
    On Convergence of Federated Averaging Langevin Dynamics. (arXiv:2112.05120v4 [stat.ML] UPDATED)
    We propose a federated averaging Langevin algorithm (FA-LD) for uncertainty quantification and mean predictions with distributed clients. In particular, we generalize beyond normal posterior distributions and consider a general class of models. We develop theoretical guarantees for FA-LD for strongly log-concave distributions with non-i.i.d data and study how the injected noise and the stochastic-gradient noise, the heterogeneity of data, and the varying learning rates affect the convergence. Such an analysis sheds light on the optimal choice of local updates to minimize communication costs. Important to our approach is that the communication efficiency does not deteriorate with the injected noise in the Langevin algorithms. In addition, we examine in our FA-LD algorithm both independent and correlated noise used over different clients. We observe there is a trade-off between the pairs among communication, accuracy, and data privacy. As local devices may become inactive in federated networks, we also show convergence results based on different averaging schemes where only partial device updates are available. In such a case, we discover an additional bias that does not decay to zero.
    TPDR: A Novel Two-Step Transformer-based Product and Class Description Match and Retrieval Method. (arXiv:2310.03491v1 [cs.IR])
    There is a niche of companies responsible for intermediating the purchase of large batches of varied products for other companies, for which the main challenge is to perform product description standardization, i.e., matching an item described by a client with a product described in a catalog. The problem is complex since the client's product description may be: (1) potentially noisy; (2) short and uninformative (e.g., missing information about model and size); and (3) cross-language. In this paper, we formalize this problem as a ranking task: given an initial client product specification (query), return the most appropriate standardized descriptions (response). In this paper, we propose TPDR, a two-step Transformer-based Product and Class Description Retrieval method that is able to explore the semantic correspondence between IS and SD, by exploiting attention mechanisms and contrastive learning. First, TPDR employs the transformers as two encoders sharing the embedding vector space: one for encoding the IS and another for the SD, in which corresponding pairs (IS, SD) must be close in the vector space. Closeness is further enforced by a contrastive learning mechanism leveraging a specialized loss function. TPDR also exploits a (second) re-ranking step based on syntactic features that are very important for the exact matching (model, dimension) of certain products that may have been neglected by the transformers. To evaluate our proposal, we consider 11 datasets from a real company, covering different application contexts. Our solution was able to retrieve the correct standardized product before the 5th ranking position in 71% of the cases and its correct category in the first position in 80% of the situations. Moreover, the effectiveness gains over purely syntactic or semantic baselines reach up to 3.7 times, solving cases that none of the approaches in isolation can do by themselves.
    Plug-and-Play Posterior Sampling under Mismatched Measurement and Prior Models. (arXiv:2310.03546v1 [stat.ML])
    Posterior sampling has been shown to be a powerful Bayesian approach for solving imaging inverse problems. The recent plug-and-play unadjusted Langevin algorithm (PnP-ULA) has emerged as a promising method for Monte Carlo sampling and minimum mean squared error (MMSE) estimation by combining physical measurement models with deep-learning priors specified using image denoisers. However, the intricate relationship between the sampling distribution of PnP-ULA and the mismatched data-fidelity and denoiser has not been theoretically analyzed. We address this gap by proposing a posterior-L2 pseudometric and using it to quantify an explicit error bound for PnP-ULA under mismatched posterior distribution. We numerically validate our theory on several inverse problems such as sampling from Gaussian mixture models and image deblurring. Our results suggest that the sensitivity of the sampling distribution of PnP-ULA to a mismatch in the measurement model and the denoiser can be precisely characterized.
    Pre-Training and Fine-Tuning Generative Flow Networks. (arXiv:2310.03419v1 [cs.LG])
    Generative Flow Networks (GFlowNets) are amortized samplers that learn stochastic policies to sequentially generate compositional objects from a given unnormalized reward distribution. They can generate diverse sets of high-reward objects, which is an important consideration in scientific discovery tasks. However, as they are typically trained from a given extrinsic reward function, it remains an important open challenge about how to leverage the power of pre-training and train GFlowNets in an unsupervised fashion for efficient adaptation to downstream tasks. Inspired by recent successes of unsupervised pre-training in various domains, we introduce a novel approach for reward-free pre-training of GFlowNets. By framing the training as a self-supervised problem, we propose an outcome-conditioned GFlowNet (OC-GFN) that learns to explore the candidate space. Specifically, OC-GFN learns to reach any targeted outcomes, akin to goal-conditioned policies in reinforcement learning. We show that the pre-trained OC-GFN model can allow for a direct extraction of a policy capable of sampling from any new reward functions in downstream tasks. Nonetheless, adapting OC-GFN on a downstream task-specific reward involves an intractable marginalization over possible outcomes. We propose a novel way to approximate this marginalization by learning an amortized predictor enabling efficient fine-tuning. Extensive experimental results validate the efficacy of our approach, demonstrating the effectiveness of pre-training the OC-GFN, and its ability to swiftly adapt to downstream tasks and discover modes more efficiently. This work may serve as a foundation for further exploration of pre-training strategies in the context of GFlowNets.
    GOAL: A Challenging Knowledge-grounded Video Captioning Benchmark for Real-time Soccer Commentary Generation. (arXiv:2303.14655v2 [cs.CV] UPDATED)
    Despite the recent emergence of video captioning models, how to generate vivid, fine-grained video descriptions based on the background knowledge (i.e., long and informative commentary about the domain-specific scenes with appropriate reasoning) is still far from being solved, which however has great applications such as automatic sports narrative. In this paper, we present GOAL, a benchmark of over 8.9k soccer video clips, 22k sentences, and 42k knowledge triples for proposing a challenging new task setting as Knowledge-grounded Video Captioning (KGVC). Moreover, we conduct experimental adaption of existing methods to show the difficulty and potential directions for solving this valuable and applicable task. Our data and code are available at https://github.com/THU-KEG/goal.
    GRAPES: Learning to Sample Graphs for Scalable Graph Neural Networks. (arXiv:2310.03399v1 [cs.LG])
    Graph neural networks (GNNs) learn the representation of nodes in a graph by aggregating the neighborhood information in various ways. As these networks grow in depth, their receptive field grows exponentially due to the increase in neighborhood sizes, resulting in high memory costs. Graph sampling solves memory issues in GNNs by sampling a small ratio of the nodes in the graph. This way, GNNs can scale to much larger graphs. Most sampling methods focus on fixed sampling heuristics, which may not generalize to different structures or tasks. We introduce GRAPES, an adaptive graph sampling method that learns to identify sets of influential nodes for training a GNN classifier. GRAPES uses a GFlowNet to learn node sampling probabilities given the classification objectives. We evaluate GRAPES across several small- and large-scale graph benchmarks and demonstrate its effectiveness in accuracy and scalability. In contrast to existing sampling methods, GRAPES maintains high accuracy even with small sample sizes and, therefore, can scale to very large graphs. Our code is publicly available at https://github.com/dfdazac/grapes.
    Adapting Large Language Models for Content Moderation: Pitfalls in Data Engineering and Supervised Fine-tuning. (arXiv:2310.03400v1 [cs.LG])
    Nowadays, billions of people engage in communication and express their opinions on the internet daily. Unfortunately, not all of these expressions are friendly or compliant, making content moderation an indispensable task. With the successful development of Large Language Models (LLMs) in recent years, LLM-based methods have become a feasible solution for handling tasks in various domains. However, in the field of content moderation, there is still a lack of detailed work that systematically introduces implementation details. In this paper, we introduce how to fine-tune an LLM model that can be privately deployed for content moderation. Specifically, we discuss whether incorporating reasons during the fine-tuning process would be better or if it should be treated as a classification task directly. We also explore the benefits of utilizing reasons generated by more powerful LLMs for fine-tuning privately deployed models and the impact of different processing approaches when the answers generated by the more powerful LLMs are incorrect. We report the entire research process and the key findings in this paper, hoping to provide valuable experience for researchers who are fine-tuning privately deployed models in their domain-specific research.
    Investigating the Limitation of CLIP Models: The Worst-Performing Categories. (arXiv:2310.03324v1 [cs.CV])
    Contrastive Language-Image Pre-training (CLIP) provides a foundation model by integrating natural language into visual concepts, enabling zero-shot recognition on downstream tasks. It is usually expected that satisfactory overall accuracy can be achieved across numerous domains through well-designed textual prompts. However, we found that their performance in the worst categories is significantly inferior to the overall performance. For example, on ImageNet, there are a total of 10 categories with class-wise accuracy as low as 0\%, even though the overall performance has achieved 64.1\%. This phenomenon reveals the potential risks associated with using CLIP models, particularly in risk-sensitive applications where specific categories hold significant importance. To address this issue, we investigate the alignment between the two modalities in the CLIP model and propose the Class-wise Matching Margin (\cmm) to measure the inference confusion. \cmm\ can effectively identify the worst-performing categories and estimate the potential performance of the candidate prompts. We further query large language models to enrich descriptions of worst-performing categories and build a weighted ensemble to highlight the efficient prompts. Experimental results clearly verify the effectiveness of our proposal, where the accuracy on the worst-10 categories on ImageNet is boosted to 5.2\%, without manual prompt engineering, laborious optimization, or access to labeled validation data.
    Uncertainty quantification for deep learning-based schemes for solving high-dimensional backward stochastic differential equations. (arXiv:2310.03393v1 [math.NA])
    Deep learning-based numerical schemes for solving high-dimensional backward stochastic differential equations (BSDEs) have recently raised plenty of scientific interest. While they enable numerical methods to approximate very high-dimensional BSDEs, their reliability has not been studied and is thus not understood. In this work, we study uncertainty quantification (UQ) for a class of deep learning-based BSDE schemes. More precisely, we review the sources of uncertainty involved in the schemes and numerically study the impact of different sources. Usually, the standard deviation (STD) of the approximate solutions obtained from multiple runs of the algorithm with different datasets is calculated to address the uncertainty. This approach is computationally quite expensive, especially for high-dimensional problems. Hence, we develop a UQ model that efficiently estimates the STD of the approximate solution using only a single run of the algorithm. The model also estimates the mean of the approximate solution, which can be leveraged to initialize the algorithm and improve the optimization process. Our numerical experiments show that the UQ model produces reliable estimates of the mean and STD of the approximate solution for the considered class of deep learning-based BSDE schemes. The estimated STD captures multiple sources of uncertainty, demonstrating its effectiveness in quantifying the uncertainty. Additionally, the model illustrates the improved performance when comparing different schemes based on the estimated STD values. Furthermore, it can identify hyperparameter values for which the scheme achieves good approximations.
    Stable Training of Probabilistic Models Using the Leave-One-Out Maximum Log-Likelihood Objective. (arXiv:2310.03556v1 [stat.ML])
    Probabilistic modelling of power systems operation and planning processes depends on data-driven methods, which require sufficiently large datasets. When historical data lacks this, it is desired to model the underlying data generation mechanism as a probability distribution to assess the data quality and generate more data, if needed. Kernel density estimation (KDE) based models are popular choices for this task, but they fail to adapt to data regions with varying densities. In this paper, an adaptive KDE model is employed to circumvent this, where each kernel in the model has an individual bandwidth. The leave-one-out maximum log-likelihood (LOO-MLL) criterion is proposed to prevent the singular solutions that the regular MLL criterion gives rise to, and it is proven that LOO-MLL prevents these. Relying on this guaranteed robustness, the model is extended by assigning learnable weights to the kernels. In addition, a modified expectation-maximization algorithm is employed to accelerate the optimization speed reliably. The performance of the proposed method and models are exhibited on two power systems datasets using different statistical tests and by comparison with Gaussian mixture models. Results show that the proposed models have promising performance, in addition to their singularity prevention guarantees.
    A Latent Variable Approach for Non-Hierarchical Multi-Fidelity Adaptive Sampling. (arXiv:2310.03298v1 [stat.ML])
    Multi-fidelity (MF) methods are gaining popularity for enhancing surrogate modeling and design optimization by incorporating data from various low-fidelity (LF) models. While most existing MF methods assume a fixed dataset, adaptive sampling methods that dynamically allocate resources among fidelity models can achieve higher efficiency in the exploring and exploiting the design space. However, most existing MF methods rely on the hierarchical assumption of fidelity levels or fail to capture the intercorrelation between multiple fidelity levels and utilize it to quantify the value of the future samples and navigate the adaptive sampling. To address this hurdle, we propose a framework hinged on a latent embedding for different fidelity models and the associated pre-posterior analysis to explicitly utilize their correlation for adaptive sampling. In this framework, each infill sampling iteration includes two steps: We first identify the location of interest with the greatest potential improvement using the high-fidelity (HF) model, then we search for the next sample across all fidelity levels that maximize the improvement per unit cost at the location identified in the first step. This is made possible by a single Latent Variable Gaussian Process (LVGP) model that maps different fidelity models into an interpretable latent space to capture their correlations without assuming hierarchical fidelity levels. The LVGP enables us to assess how LF sampling candidates will affect HF response with pre-posterior analysis and determine the next sample with the best benefit-to-cost ratio. Through test cases, we demonstrate that the proposed method outperforms the benchmark methods in both MF global fitting (GF) and Bayesian Optimization (BO) problems in convergence rate and robustness. Moreover, the method offers the flexibility to switch between GF and BO by simply changing the acquisition function.
    Text as Environment: A Deep Reinforcement Learning Text Readability Assessment Model. (arXiv:1912.05957v3 [cs.CL] UPDATED)
    Evaluating the readability of a text can significantly facilitate the precise expression of information in written form. The formulation of text readability assessment involves the identification of meaningful properties of the text regardless of its length. Sophisticated features and models are used to evaluate the comprehensibility of texts accurately. Despite this, the problem of assessing texts' readability efficiently remains relatively untouched. The efficiency of state-of-the-art text readability assessment models can be further improved using deep reinforcement learning models. Using a hard attention-based active inference technique, the proposed approach makes efficient use of input text and computational resources. Through the use of semi-supervised signals, the reinforcement learning model uses the minimum amount of text in order to determine text's readability. A comparison of the model on Weebit and Cambridge Exams with state-of-the-art models, such as the BERT text readability model, shows that it is capable of achieving state-of-the-art accuracy with a significantly smaller amount of input text than other models.
    Neural Language Model Pruning for Automatic Speech Recognition. (arXiv:2310.03424v1 [cs.LG])
    We study model pruning methods applied to Transformer-based neural network language models for automatic speech recognition. We explore three aspects of the pruning frame work, namely criterion, method and scheduler, analyzing their contribution in terms of accuracy and inference speed. To the best of our knowledge, such in-depth analyses on large-scale recognition systems has not been reported in the literature. In addition, we propose a variant of low-rank approximation suitable for incrementally compressing models, and delivering multiple models with varied target sizes. Among other results, we show that a) data-driven pruning outperforms magnitude-driven in several scenarios; b) incremental pruning achieves higher accuracy compared to one-shot pruning, especially when targeting smaller sizes; and c) low-rank approximation presents the best trade-off between size reduction and inference speed-up for moderate compression.
    Probabilistic Forecasting of Day-Ahead Electricity Prices and their Volatility with LSTMs. (arXiv:2310.03339v1 [cs.LG])
    Accurate forecasts of electricity prices are crucial for the management of electric power systems and the development of smart applications. European electricity prices have risen substantially and became highly volatile after the Russian invasion of Ukraine, challenging established forecasting methods. Here, we present a Long Short-Term Memory (LSTM) model for the German-Luxembourg day-ahead electricity prices addressing these challenges. The recurrent structure of the LSTM allows the model to adapt to trends, while the joint prediction of both mean and standard deviation enables a probabilistic prediction. Using a physics-inspired approach - superstatistics - to derive an explanation for the statistics of prices, we show that the LSTM model faithfully reproduces both prices and their volatility.
    Deep Controlled Learning for Inventory Control. (arXiv:2011.15122v6 [cs.LG] UPDATED)
    Problem Definition: Are traditional deep reinforcement learning (DRL) algorithms, developed for a broad range of purposes including game-play and robotics, the most suitable machine learning algorithms for applications in inventory control? To what extent would DRL algorithms tailored to the unique characteristics of inventory control problems provide superior performance compared to DRL and traditional benchmarks? Methodology/results: We propose and study Deep Controlled Learning (DCL), a new DRL framework based on approximate policy iteration specifically designed to tackle inventory problems. Comparative evaluations reveal that DCL outperforms existing state-of-the-art heuristics in lost sales inventory control, perishable inventory systems, and inventory systems with random lead times, achieving lower average costs across all test instances and maintaining an optimality gap of no more than 0.1\%. Notably, the same hyperparameter set is utilized across all experiments, underscoring the robustness and generalizability of the proposed method. Managerial implications: These substantial performance and robustness improvements pave the way for the effective application of tailored DRL algorithms to inventory management problems, empowering decision-makers to optimize stock levels, minimize costs, and enhance responsiveness across various industries.
    FedNAR: Federated Optimization with Normalized Annealing Regularization. (arXiv:2310.03163v1 [cs.LG])
    Weight decay is a standard technique to improve generalization performance in modern deep neural network optimization, and is also widely adopted in federated learning (FL) to prevent overfitting in local clients. In this paper, we first explore the choices of weight decay and identify that weight decay value appreciably influences the convergence of existing FL algorithms. While preventing overfitting is crucial, weight decay can introduce a different optimization goal towards the global objective, which is further amplified in FL due to multiple local updates and heterogeneous data distribution. To address this challenge, we develop {\it Federated optimization with Normalized Annealing Regularization} (FedNAR), a simple yet effective and versatile algorithmic plug-in that can be seamlessly integrated into any existing FL algorithms. Essentially, we regulate the magnitude of each update by performing co-clipping of the gradient and weight decay. We provide a comprehensive theoretical analysis of FedNAR's convergence rate and conduct extensive experiments on both vision and language datasets with different backbone federated optimization algorithms. Our experimental results consistently demonstrate that incorporating FedNAR into existing FL algorithms leads to accelerated convergence and heightened model accuracy. Moreover, FedNAR exhibits resilience in the face of various hyperparameter configurations. Specifically, FedNAR has the ability to self-adjust the weight decay when the initial specification is not optimal, while the accuracy of traditional FL algorithms would markedly decline. Our codes are released at \href{https://github.com/ljb121002/fednar}{https://github.com/ljb121002/fednar}.
    Resilient Legged Local Navigation: Learning to Traverse with Compromised Perception End-to-End. (arXiv:2310.03581v1 [cs.RO])
    Autonomous robots must navigate reliably in unknown environments even under compromised exteroceptive perception, or perception failures. Such failures often occur when harsh environments lead to degraded sensing, or when the perception algorithm misinterprets the scene due to limited generalization. In this paper, we model perception failures as invisible obstacles and pits, and train a reinforcement learning (RL) based local navigation policy to guide our legged robot. Unlike previous works relying on heuristics and anomaly detection to update navigational information, we train our navigation policy to reconstruct the environment information in the latent space from corrupted perception and react to perception failures end-to-end. To this end, we incorporate both proprioception and exteroception into our policy inputs, thereby enabling the policy to sense collisions on different body parts and pits, prompting corresponding reactions. We validate our approach in simulation and on the real quadruped robot ANYmal running in real-time (<10 ms CPU inference). In a quantitative comparison with existing heuristic-based locally reactive planners, our policy increases the success rate over 30% when facing perception failures. Project Page: https://bit.ly/45NBTuh.
    TimeGPT-1. (arXiv:2310.03589v1 [cs.LG])
    In this paper, we introduce TimeGPT, the first foundation model for time series, capable of generating accurate predictions for diverse datasets not seen during training. We evaluate our pre-trained model against established statistical, machine learning, and deep learning methods, demonstrating that TimeGPT zero-shot inference excels in performance, efficiency, and simplicity. Our study provides compelling evidence that insights from other domains of artificial intelligence can be effectively applied to time series analysis. We conclude that large-scale time series models offer an exciting opportunity to democratize access to precise predictions and reduce uncertainty by leveraging the capabilities of contemporary advancements in deep learning.
    Conditional Generative Models for Simulation of EMG During Naturalistic Movements. (arXiv:2211.01856v4 [cs.LG] UPDATED)
    Numerical models of electromyographic (EMG) signals have provided a huge contribution to our fundamental understanding of human neurophysiology and remain a central pillar of motor neuroscience and the development of human-machine interfaces. However, whilst modern biophysical simulations based on finite element methods are highly accurate, they are extremely computationally expensive and thus are generally limited to modelling static systems such as isometrically contracting limbs. As a solution to this problem, we propose a transfer learning approach, in which a conditional generative model is trained to mimic the output of an advanced numerical model. To this end, we present BioMime, a conditional generative neural network trained adversarially to generate motor unit activation potential waveforms under a wide variety of volume conductor parameters. We demonstrate the ability of such a model to predictively interpolate between a much smaller number of numerical model's outputs with a high accuracy. Consequently, the computational load is dramatically reduced, which allows the rapid simulation of EMG signals during truly dynamic and naturalistic movements.
    A 5' UTR Language Model for Decoding Untranslated Regions of mRNA and Function Predictions. (arXiv:2310.03281v1 [cs.LG])
    The 5' UTR, a regulatory region at the beginning of an mRNA molecule, plays a crucial role in regulating the translation process and impacts the protein expression level. Language models have showcased their effectiveness in decoding the functions of protein and genome sequences. Here, we introduced a language model for 5' UTR, which we refer to as the UTR-LM. The UTR-LM is pre-trained on endogenous 5' UTRs from multiple species and is further augmented with supervised information including secondary structure and minimum free energy. We fine-tuned the UTR-LM in a variety of downstream tasks. The model outperformed the best-known benchmark by up to 42% for predicting the Mean Ribosome Loading, and by up to 60% for predicting the Translation Efficiency and the mRNA Expression Level. The model also applies to identifying unannotated Internal Ribosome Entry Sites within the untranslated region and improves the AUPR from 0.37 to 0.52 compared to the best baseline. Further, we designed a library of 211 novel 5' UTRs with high predicted values of translation efficiency and evaluated them via a wet-lab assay. Experiment results confirmed that our top designs achieved a 32.5% increase in protein production level relative to well-established 5' UTR optimized for therapeutics.
    High-dimensional Bayesian Optimization with Group Testing. (arXiv:2310.03515v1 [cs.LG])
    Bayesian optimization is an effective method for optimizing expensive-to-evaluate black-box functions. High-dimensional problems are particularly challenging as the surrogate model of the objective suffers from the curse of dimensionality, which makes accurate modeling difficult. We propose a group testing approach to identify active variables to facilitate efficient optimization in these domains. The proposed algorithm, Group Testing Bayesian Optimization (GTBO), first runs a testing phase where groups of variables are systematically selected and tested on whether they influence the objective. To that end, we extend the well-established theory of group testing to functions of continuous ranges. In the second phase, GTBO guides optimization by placing more importance on the active dimensions. By exploiting the axis-aligned subspace assumption, GTBO is competitive against state-of-the-art methods on several synthetic and real-world high-dimensional optimization tasks. Furthermore, GTBO aids in the discovery of active parameters in applications, thereby enhancing practitioners' understanding of the problem at hand.
    Combining Differential Privacy and Byzantine Resilience in Distributed SGD. (arXiv:2110.03991v4 [cs.LG] UPDATED)
    Privacy and Byzantine resilience (BR) are two crucial requirements of modern-day distributed machine learning. The two concepts have been extensively studied individually but the question of how to combine them effectively remains unanswered. This paper contributes to addressing this question by studying the extent to which the distributed SGD algorithm, in the standard parameter-server architecture, can learn an accurate model despite (a) a fraction of the workers being malicious (Byzantine), and (b) the other fraction, whilst being honest, providing noisy information to the server to ensure differential privacy (DP). We first observe that the integration of standard practices in DP and BR is not straightforward. In fact, we show that many existing results on the convergence of distributed SGD under Byzantine faults, especially those relying on $(\alpha,f)$-Byzantine resilience, are rendered invalid when honest workers enforce DP. To circumvent this shortcoming, we revisit the theory of $(\alpha,f)$-BR to obtain an approximate convergence guarantee. Our analysis provides key insights on how to improve this guarantee through hyperparameter optimization. Essentially, our theoretical and empirical results show that (1) an imprudent combination of standard approaches to DP and BR might be fruitless, but (2) by carefully re-tuning the learning algorithm, we can obtain reasonable learning accuracy while simultaneously guaranteeing DP and BR.
    Benchmarking Large Language Models As AI Research Agents. (arXiv:2310.03302v1 [cs.LG])
    Scientific experimentation involves an iterative process of creating hypotheses, designing experiments, running experiments, and analyzing the results. Can we build AI research agents to perform these long-horizon tasks? To take a step towards building and evaluating research agents on such open-ended decision-making tasks, we focus on the problem of machine learning engineering: given a task description and a dataset, build a high-performing model. In this paper, we propose MLAgentBench, a suite of ML tasks for benchmarking AI research agents. Agents can perform actions like reading/writing files, executing code, and inspecting outputs. With these actions, agents could run experiments, analyze the results, and modify the code of entire machine learning pipelines, such as data processing, architecture, training processes, etc. The benchmark then automatically evaluates the agent's performance objectively over various metrics related to performance and efficiency. We also design an LLM-based research agent to automatically perform experimentation loops in such an environment. Empirically, we find that a GPT-4-based research agent can feasibly build compelling ML models over many tasks in MLAgentBench, displaying highly interpretable plans and actions. However, the success rates vary considerably; they span from almost 90\% on well-established older datasets to as low as 10\% on recent Kaggle Challenges -- unavailable during the LLM model's pretraining -- and even 0\% on newer research challenges like BabyLM. Finally, we identify several key challenges for LLM-based research agents such as long-term planning and hallucination. Our code is released at https://github.com/snap-stanford/MLAgentBench.
    Evaluating the Robustness of Interpretability Methods through Explanation Invariance and Equivariance. (arXiv:2304.06715v3 [cs.LG] UPDATED)
    Interpretability methods are valuable only if their explanations faithfully describe the explained model. In this work, we consider neural networks whose predictions are invariant under a specific symmetry group. This includes popular architectures, ranging from convolutional to graph neural networks. Any explanation that faithfully explains this type of model needs to be in agreement with this invariance property. We formalize this intuition through the notion of explanation invariance and equivariance by leveraging the formalism from geometric deep learning. Through this rigorous formalism, we derive (1) two metrics to measure the robustness of any interpretability method with respect to the model symmetry group; (2) theoretical robustness guarantees for some popular interpretability methods and (3) a systematic approach to increase the invariance of any interpretability method with respect to a symmetry group. By empirically measuring our metrics for explanations of models associated with various modalities and symmetry groups, we derive a set of 5 guidelines to allow users and developers of interpretability methods to produce robust explanations.
    OpenPatch: a 3D patchwork for Out-Of-Distribution detectionpdf icon. (arXiv:2310.03388v1 [cs.CV])
    Moving deep learning models from the laboratory setting to the open world entails preparing them to handle unforeseen conditions. In several applications the occurrence of novel classes during deployment poses a significant threat, thus it is crucial to effectively detect them. Ideally, this skill should be used when needed without requiring any further computational training effort at every new task. Out-of-distribution detection has attracted significant attention in the last years, however the majority of the studies deal with 2D images ignoring the inherent 3D nature of the real-world and often confusing between domain and semantic novelty. In this work, we focus on the latter, considering the objects geometric structure captured by 3D point clouds regardless of the specific domain. We advance the field by introducing OpenPatch that builds on a large pre-trained model and simply extracts from its intermediate features a set of patch representations that describe each known class. For any new sample, we obtain a novelty score by evaluating whether it can be recomposed mainly by patches of a single known class or rather via the contribution of multiple classes. We present an extensive experimental evaluation of our approach for the task of semantic novelty detection on real-world point cloud samples when the reference known data are synthetic. We demonstrate that OpenPatch excels in both the full and few-shot known sample scenarios, showcasing its robustness across varying pre-training objectives and network backbones. The inherent training-free nature of our method allows for its immediate application to a wide array of real-world tasks, offering a compelling advantage over approaches that need expensive retraining efforts.
    A Framework for Large Scale Synthetic Graph Dataset Generation. (arXiv:2210.01944v4 [cs.LG] UPDATED)
    Recently there has been increasing interest in developing and deploying deep graph learning algorithms for many tasks, such as fraud detection and recommender systems. Albeit, there is a limited number of publicly available graph-structured datasets, most of which are tiny compared to production-sized applications or are limited in their application domain. This work tackles this shortcoming by proposing a scalable synthetic graph generation tool to scale the datasets to production-size graphs with trillions of edges and billions of nodes. The tool learns a series of parametric models from proprietary datasets that can be released to researchers to study various graph methods on the synthetic data increasing prototype development and novel applications. We demonstrate the generalizability of the framework across a series of datasets, mimicking structural and feature distributions as well as the ability to scale them across varying sizes demonstrating their usefulness for benchmarking and model development. Code can be found on https://github.com/NVIDIA/DeepLearningExamples/tree/master/Tools/DGLPyTorch/SyntheticGraphGeneration.
    Two-stage LLM Fine-tuning with Less Specialization and More Generalization. (arXiv:2211.00635v2 [cs.CL] UPDATED)
    Pretrained large language models (LLMs) are general purpose problem solvers applicable to a diverse set of tasks with prompts. They can be further improved towards a specific task by fine-tuning on a specialized dataset. However, fine-tuning usually makes the model narrowly specialized on this dataset with reduced general in-context learning performances, which is undesirable whenever the fine-tuned model needs to handle additional tasks where no fine-tuning data is available. In this work, we first demonstrate that fine-tuning on a single task indeed decreases LLMs' general in-context learning performance. We discover one important cause of such forgetting, format specialization, where the model overfits to the format of the fine-tuned task. We further show that format specialization happens at the very beginning of fine-tuning. To solve this problem, we propose Prompt Tuning with MOdel Tuning (ProMoT), a simple yet effective two-stage fine-tuning framework that reduces format specialization and improves generalization. ProMoT offloads task-specific format learning into additional and removable parameters by first doing prompt tuning and then fine-tuning the model itself with this soft prompt attached. With experiments on several fine-tuning tasks and 8 in-context evaluation tasks, we show that ProMoT achieves comparable performance on fine-tuned tasks to standard fine-tuning, but with much less loss of in-context learning performances across a board range of out-of-domain evaluation tasks. More importantly, ProMoT can even enhance generalization on in-context learning tasks that are semantically related to the fine-tuned task, e.g. ProMoT on En-Fr translation significantly improves performance on other language pairs, and ProMoT on NLI improves performance on summarization. Experiments also show that ProMoT can improve the generalization performance of multi-task training.
    Enhancing Adversarial Robustness via Score-Based Optimization. (arXiv:2307.04333v2 [cs.LG] UPDATED)
    Adversarial attacks have the potential to mislead deep neural network classifiers by introducing slight perturbations. Developing algorithms that can mitigate the effects of these attacks is crucial for ensuring the safe use of artificial intelligence. Recent studies have suggested that score-based diffusion models are effective in adversarial defenses. However, existing diffusion-based defenses rely on the sequential simulation of the reversed stochastic differential equations of diffusion models, which are computationally inefficient and yield suboptimal results. In this paper, we introduce a novel adversarial defense scheme named ScoreOpt, which optimizes adversarial samples at test-time, towards original clean data in the direction guided by score-based priors. We conduct comprehensive experiments on multiple datasets, including CIFAR10, CIFAR100 and ImageNet. Our experimental results demonstrate that our approach outperforms existing adversarial defenses in terms of both robustness performance and inference speed.
    Benchmarking Local Robustness of High-Accuracy Binary Neural Networks for Enhanced Traffic Sign Recognition. (arXiv:2310.03033v1 [cs.CV])
    Traffic signs play a critical role in road safety and traffic management for autonomous driving systems. Accurate traffic sign classification is essential but challenging due to real-world complexities like adversarial examples and occlusions. To address these issues, binary neural networks offer promise in constructing classifiers suitable for resource-constrained devices. In our previous work, we proposed high-accuracy BNN models for traffic sign recognition, focusing on compact size for limited computation and energy resources. To evaluate their local robustness, this paper introduces a set of benchmark problems featuring layers that challenge state-of-the-art verification tools. These layers include binarized convolutions, max pooling, batch normalization, fully connected. The difficulty of the verification problem is given by the high number of network parameters (905k - 1.7 M), of the input dimension (2.7k-12k), and of the number of regions (43) as well by the fact that the neural networks are not sparse. The proposed BNN models and local robustness properties can be checked at https://github.com/ChristopherBrix/vnncomp2023_benchmarks/tree/main/benchmarks/traffic_signs_recognition. The results of the 4th International Verification of Neural Networks Competition (VNN-COMP'23) revealed the fact that 4, out of 7, solvers can handle many of our benchmarks randomly selected (minimum is 6, maximum is 36, out of 45). Surprisingly, tools output also wrong results or missing counterexample (ranging from 1 to 4). Currently, our focus lies in exploring the possibility of achieving a greater count of solved instances by extending the allotted time (previously set at 8 minutes). Furthermore, we are intrigued by the reasons behind the erroneous outcomes provided by the tools for certain benchmarks.
    Comparing Time-Series Analysis Approaches Utilized in Research Papers to Forecast COVID-19 Cases in Africa: A Literature Review. (arXiv:2310.03606v1 [cs.LG])
    This literature review aimed to compare various time-series analysis approaches utilized in forecasting COVID-19 cases in Africa. The study involved a methodical search for English-language research papers published between January 2020 and July 2023, focusing specifically on papers that utilized time-series analysis approaches on COVID-19 datasets in Africa. A variety of databases including PubMed, Google Scholar, Scopus, and Web of Science were utilized for this process. The research papers underwent an evaluation process to extract relevant information regarding the implementation and performance of the time-series analysis models. The study highlighted the different methodologies employed, evaluating their effectiveness and limitations in forecasting the spread of the virus. The result of this review could contribute deeper insights into the field, and future research should consider these insights to improve time series analysis models and explore the integration of different approaches for enhanced public health decision-making.
    Sparse Deep Learning for Time Series Data: Theory and Applications. (arXiv:2310.03243v1 [stat.ML])
    Sparse deep learning has become a popular technique for improving the performance of deep neural networks in areas such as uncertainty quantification, variable selection, and large-scale network compression. However, most existing research has focused on problems where the observations are independent and identically distributed (i.i.d.), and there has been little work on the problems where the observations are dependent, such as time series data and sequential data in natural language processing. This paper aims to address this gap by studying the theory for sparse deep learning with dependent data. We show that sparse recurrent neural networks (RNNs) can be consistently estimated, and their predictions are asymptotically normally distributed under appropriate assumptions, enabling the prediction uncertainty to be correctly quantified. Our numerical results show that sparse deep learning outperforms state-of-the-art methods, such as conformal predictions, in prediction uncertainty quantification for time series data. Furthermore, our results indicate that the proposed method can consistently identify the autoregressive order for time series data and outperform existing methods in large-scale model compression. Our proposed method has important practical implications in fields such as finance, healthcare, and energy, where both accurate point estimates and prediction uncertainty quantification are of concern.
    Robust Representation Learning via Asymmetric Negative Contrast and Reverse Attention. (arXiv:2310.03358v1 [cs.CV])
    Deep neural networks are vulnerable to adversarial noise. Adversarial training (AT) has been demonstrated to be the most effective defense strategy to protect neural networks from being fooled. However, we find AT omits to learning robust features, resulting in poor performance of adversarial robustness. To address this issue, we highlight two characteristics of robust representation: (1) $\bf{exclusion}$: the feature of natural examples keeps away from that of other classes; (2) $\bf{alignment}$: the feature of natural and corresponding adversarial examples is close to each other. These motivate us to propose a generic framework of AT to gain robust representation, by the asymmetric negative contrast and reverse attention. Specifically, we design an asymmetric negative contrast based on predicted probabilities, to push away examples of different classes in the feature space. Moreover, we propose to weight feature by parameters of the linear classifier as the reverse attention, to obtain class-aware feature and pull close the feature of the same class. Empirical evaluations on three benchmark datasets show our methods greatly advance the robustness of AT and achieve state-of-the-art performance. Code is available at .
    Colossal-AI: A Unified Deep Learning System For Large-Scale Parallel Training. (arXiv:2110.14883v3 [cs.LG] UPDATED)
    The success of Transformer models has pushed the deep learning model scale to billions of parameters. Due to the limited memory resource of a single GPU, However, the best practice for choosing the optimal parallel strategy is still lacking, since it requires domain expertise in both deep learning and parallel computing. The Colossal-AI system addressed the above challenge by introducing a unified interface to scale your sequential code of model training to distributed environments. It supports parallel training methods such as data, pipeline, tensor, and sequence parallelism, as well as heterogeneous training methods integrated with zero redundancy optimizer. Compared to the baseline system, Colossal-AI can achieve up to 2.76 times training speedup on large-scale models.
    Interpolating between Clustering and Dimensionality Reduction with Gromov-Wasserstein. (arXiv:2310.03398v1 [cs.LG])
    We present a versatile adaptation of existing dimensionality reduction (DR) objectives, enabling the simultaneous reduction of both sample and feature sizes. Correspondances between input and embedding samples are computed through a semi-relaxed Gromov-Wasserstein optimal transport (OT) problem. When the embedding sample size matches that of the input, our model recovers classical popular DR models. When the embedding's dimensionality is unconstrained, we show that the OT plan delivers a competitive hard clustering. We emphasize the importance of intermediate stages that blend DR and clustering for summarizing real data and apply our method to visualize datasets of images.
    Multi-Resolution Audio-Visual Feature Fusion for Temporal Action Localization. (arXiv:2310.03456v1 [cs.CV])
    Temporal Action Localization (TAL) aims to identify actions' start, end, and class labels in untrimmed videos. While recent advancements using transformer networks and Feature Pyramid Networks (FPN) have enhanced visual feature recognition in TAL tasks, less progress has been made in the integration of audio features into such frameworks. This paper introduces the Multi-Resolution Audio-Visual Feature Fusion (MRAV-FF), an innovative method to merge audio-visual data across different temporal resolutions. Central to our approach is a hierarchical gated cross-attention mechanism, which discerningly weighs the importance of audio information at diverse temporal scales. Such a technique not only refines the precision of regression boundaries but also bolsters classification confidence. Importantly, MRAV-FF is versatile, making it compatible with existing FPN TAL architectures and offering a significant enhancement in performance when audio data is available.
    FedHyper: A Universal and Robust Learning Rate Scheduler for Federated Learning with Hypergradient Descent. (arXiv:2310.03156v1 [cs.LG])
    The theoretical landscape of federated learning (FL) undergoes rapid evolution, but its practical application encounters a series of intricate challenges, and hyperparameter optimization is one of these critical challenges. Amongst the diverse adjustments in hyperparameters, the adaptation of the learning rate emerges as a crucial component, holding the promise of significantly enhancing the efficacy of FL systems. In response to this critical need, this paper presents FedHyper, a novel hypergradient-based learning rate adaptation algorithm specifically designed for FL. FedHyper serves as a universal learning rate scheduler that can adapt both global and local rates as the training progresses. In addition, FedHyper not only showcases unparalleled robustness to a spectrum of initial learning rate configurations but also significantly alleviates the necessity for laborious empirical learning rate adjustments. We provide a comprehensive theoretical analysis of FedHyper's convergence rate and conduct extensive experiments on vision and language benchmark datasets. The results demonstrate that FEDHYPER consistently converges 1.1-3x faster than FedAvg and the competing baselines while achieving superior final accuracy. Moreover, FedHyper catalyzes a remarkable surge in accuracy, augmenting it by up to 15% compared to FedAvg under suboptimal initial learning rate settings.
    On the definition of toxicity in NLP. (arXiv:2310.02357v2 [cs.CL] UPDATED)
    The fundamental problem in toxicity detection task lies in the fact that the toxicity is ill-defined. This causes us to rely on subjective and vague data in models' training, which results in non-robust and non-accurate results: garbage in - garbage out. This work suggests a new, stress-level-based definition of toxicity designed to be objective and context-aware. On par with it, we also describe possible ways of applying this new definition to dataset creation and model training.
    Towards Understanding the Effect of Pretraining Label Granularity. (arXiv:2303.16887v2 [cs.CV] UPDATED)
    In this paper, we study how the granularity of pretraining labels affects the generalization of deep neural networks in image classification tasks. We focus on the "fine-to-coarse" transfer learning setting, where the pretraining label space is more fine-grained than that of the target problem. Empirically, we show that pretraining on the leaf labels of ImageNet21k produces better transfer results on ImageNet1k than pretraining on other coarser granularity levels, which supports the common practice used in the community. Theoretically, we explain the benefit of fine-grained pretraining by proving that, for a data distribution satisfying certain hierarchy conditions, 1) coarse-grained pretraining only allows a neural network to learn the "common" or "easy-to-learn" features well, while 2) fine-grained pretraining helps the network learn the "rarer" or "fine-grained" features in addition to the common ones, thus improving its accuracy on hard downstream test samples in which common features are missing or weak in strength. Furthermore, we perform comprehensive experiments using the label hierarchies of iNaturalist 2021 and observe that the following conditions, in addition to proper choice of label granularity, enable the transfer to work well in practice: 1) the pretraining dataset needs to have a meaningful label hierarchy, and 2) the pretraining and target label functions need to align well.
    Learning Energy Decompositions for Partial Inference of GFlowNets. (arXiv:2310.03301v1 [cs.LG])
    This paper studies generative flow networks (GFlowNets) to sample objects from the Boltzmann energy distribution via a sequence of actions. In particular, we focus on improving GFlowNet with partial inference: training flow functions with the evaluation of the intermediate states or transitions. To this end, the recently developed forward-looking GFlowNet reparameterizes the flow functions based on evaluating the energy of intermediate states. However, such an evaluation of intermediate energies may (i) be too expensive or impossible to evaluate and (ii) even provide misleading training signals under large energy fluctuations along the sequence of actions. To resolve this issue, we propose learning energy decompositions for GFlowNets (LED-GFN). Our main idea is to (i) decompose the energy of an object into learnable potential functions defined on state transitions and (ii) reparameterize the flow functions using the potential functions. In particular, to produce informative local credits, we propose to regularize the potential to change smoothly over the sequence of actions. It is also noteworthy that training GFlowNet with our learned potential can preserve the optimal policy. We empirically verify the superiority of LED-GFN in five problems including the generation of unstructured and maximum independent sets, molecular graphs, and RNA sequences.
    Deep Ridgelet Transform: Voice with Koopman Operator Proves Universality of Formal Deep Networks. (arXiv:2310.03529v1 [cs.LG])
    We identify hidden layers inside a DNN with group actions on the data space, and formulate the DNN as a dual voice transform with respect to Koopman operator, a linear representation of the group action. Based on the group theoretic arguments, particularly by using Schur's lemma, we show a simple proof of the universality of those DNNs.
    Relational Convolutional Networks: A framework for learning representations of hierarchical relations. (arXiv:2310.03240v1 [cs.LG])
    A maturing area of research in deep learning is the development of architectures that can learn explicit representations of relational features. In this paper, we focus on the problem of learning representations of hierarchical relations, proposing an architectural framework we call "relational convolutional networks". Given a sequence of objects, a "multi-dimensional inner product relation" module produces a relation tensor describing all pairwise relations. A "relational convolution" layer then transforms the relation tensor into a sequence of new objects, each describing the relations within some group of objects at the previous layer. Graphlet filters, analogous to filters in convolutional neural networks, represent a template of relations against which the relation tensor is compared at each grouping. Repeating this yields representations of higher-order, hierarchical relations. We present the motivation and details of the architecture, together with a set of experiments to demonstrate how relational convolutional networks can provide an effective framework for modeling relational tasks that have hierarchical structure.
    The Geometric Structure of Fully-Connected ReLU-Layers. (arXiv:2310.03482v1 [cs.LG])
    We formalize and interpret the geometric structure of $d$-dimensional fully connected ReLU-layers in neural networks. The parameters of a ReLU-layer induce a natural partition of the input domain, such that in each sector of the partition, the ReLU-layer can be greatly simplified. This leads to a geometric interpretation of a ReLU-layer as a projection onto a polyhedral cone followed by an affine transformation, in line with the description in [doi:10.48550/arXiv.1905.08922] for convolutional networks with ReLU activations. Further, this structure facilitates simplified expressions for preimages of the intersection between partition sectors and hyperplanes, which is useful when describing decision boundaries in a classification setting. We investigate this in detail for a feed-forward network with one hidden ReLU-layer, where we provide results on the geometric complexity of the decision boundary generated by such networks, as well as proving that modulo an affine transformation, such a network can only generate $d$ different decision boundaries. Finally, the effect of adding more layers to the network is discussed.
    Towards out-of-distribution generalizable predictions of chemical kinetics properties. (arXiv:2310.03152v1 [cs.LG])
    Machine Learning (ML) techniques have found applications in estimating chemical kinetics properties. With the accumulated drug molecules identified through "AI4drug discovery", the next imperative lies in AI-driven design for high-throughput chemical synthesis processes, with the estimation of properties of unseen reactions with unexplored molecules. To this end, the existing ML approaches for kinetics property prediction are required to be Out-Of-Distribution (OOD) generalizable. In this paper, we categorize the OOD kinetic property prediction into three levels (structure, condition, and mechanism), revealing unique aspects of such problems. Under this framework, we create comprehensive datasets to benchmark (1) the state-of-the-art ML approaches for reaction prediction in the OOD setting and (2) the state-of-the-art graph OOD methods in kinetics property prediction problems. Our results demonstrated the challenges and opportunities in OOD kinetics property prediction. Our datasets and benchmarks can further support research in this direction.
    Leveraging Model-based Trees as Interpretable Surrogate Models for Model Distillation. (arXiv:2310.03112v1 [stat.ML])
    Surrogate models play a crucial role in retrospectively interpreting complex and powerful black box machine learning models via model distillation. This paper focuses on using model-based trees as surrogate models which partition the feature space into interpretable regions via decision rules. Within each region, interpretable models based on additive main effects are used to approximate the behavior of the black box model, striking for an optimal balance between interpretability and performance. Four model-based tree algorithms, namely SLIM, GUIDE, MOB, and CTree, are compared regarding their ability to generate such surrogate models. We investigate fidelity, interpretability, stability, and the algorithms' capability to capture interaction effects through appropriate splits. Based on our comprehensive analyses, we finally provide an overview of user-specific recommendations.
    Mitigating Pilot Contamination and Enabling IoT Scalability in Massive MIMO Systems. (arXiv:2310.03278v1 [cs.IT])
    Massive MIMO is expected to play an important role in the development of 5G networks. This paper addresses the issue of pilot contamination and scalability in massive MIMO systems. The current practice of reusing orthogonal pilot sequences in adjacent cells leads to difficulty in differentiating incoming inter- and intra-cell pilot sequences. One possible solution is to increase the number of orthogonal pilot sequences, which results in dedicating more space of coherence block to pilot transmission than data transmission. This, in turn, also hinders the scalability of massive MIMO systems, particularly in accommodating a large number of IoT devices within a cell. To overcome these challenges, this paper devises an innovative pilot allocation scheme based on the data transfer patterns of IoT devices. The scheme assigns orthogonal pilot sequences to clusters of devices instead of individual devices, allowing multiple devices to utilize the same pilot for periodically transmitting data. Moreover, we formulate the pilot assignment problem as a graph coloring problem and use the max k-cut graph partitioning approach to overcome the pilot contamination in a multicell massive MIMO system. The proposed scheme significantly improves the spectral efficiency and enables the scalability of massive MIMO systems; for instance, by using ten orthogonal pilot sequences, we are able to accommodate 200 devices with only a 12.5% omission rate.
    Molecule Design by Latent Prompt Transformer. (arXiv:2310.03253v1 [cs.LG])
    This paper proposes a latent prompt Transformer model for solving challenging optimization problems such as molecule design, where the goal is to find molecules with optimal values of a target chemical or biological property that can be computed by an existing software. Our proposed model consists of three components. (1) A latent vector whose prior distribution is modeled by a Unet transformation of a Gaussian white noise vector. (2) A molecule generation model that generates the string-based representation of molecule conditional on the latent vector in (1). We adopt the causal Transformer model that takes the latent vector in (1) as prompt. (3) A property prediction model that predicts the value of the target property of a molecule based on a non-linear regression on the latent vector in (1). We call the proposed model the latent prompt Transformer model. After initial training of the model on existing molecules and their property values, we then gradually shift the model distribution towards the region that supports desired values of the target property for the purpose of molecule design. Our experiments show that our proposed model achieves state of the art performances on several benchmark molecule design tasks.
    Detecting Electricity Service Equity Issues with Transfer Counterfactual Learning on Large-Scale Outage Datasets. (arXiv:2310.03258v1 [cs.LG])
    Energy justice is a growing area of interest in interdisciplinary energy research. However, identifying systematic biases in the energy sector remains challenging due to confounding variables, intricate heterogeneity in treatment effects, and limited data availability. To address these challenges, we introduce a novel approach for counterfactual causal analysis centered on energy justice. We use subgroup analysis to manage diverse factors and leverage the idea of transfer learning to mitigate data scarcity in each subgroup. In our numerical analysis, we apply our method to a large-scale customer-level power outage data set and investigate the counterfactual effect of demographic factors, such as income and age of the population, on power outage durations. Our results indicate that low-income and elderly-populated areas consistently experience longer power outages, regardless of weather conditions. This points to existing biases in the power system and highlights the need for focused improvements in areas with economic challenges.
    Raze to the Ground: Query-Efficient Adversarial HTML Attacks on Machine-Learning Phishing Webpage Detectors. (arXiv:2310.03166v1 [cs.CR])
    Machine-learning phishing webpage detectors (ML-PWD) have been shown to suffer from adversarial manipulations of the HTML code of the input webpage. Nevertheless, the attacks recently proposed have demonstrated limited effectiveness due to their lack of optimizing the usage of the adopted manipulations, and they focus solely on specific elements of the HTML code. In this work, we overcome these limitations by first designing a novel set of fine-grained manipulations which allow to modify the HTML code of the input phishing webpage without compromising its maliciousness and visual appearance, i.e., the manipulations are functionality- and rendering-preserving by design. We then select which manipulations should be applied to bypass the target detector by a query-efficient black-box optimization algorithm. Our experiments show that our attacks are able to raze to the ground the performance of current state-of-the-art ML-PWD using just 30 queries, thus overcoming the weaker attacks developed in previous work, and enabling a much fairer robustness evaluation of ML-PWD.
    History Matching for Geological Carbon Storage using Data-Space Inversion with Spatio-Temporal Data Parameterization. (arXiv:2310.03228v1 [cs.LG])
    History matching based on monitoring data will enable uncertainty reduction, and thus improved aquifer management, in industrial-scale carbon storage operations. In traditional model-based data assimilation, geomodel parameters are modified to force agreement between flow simulation results and observations. In data-space inversion (DSI), history-matched quantities of interest, e.g., posterior pressure and saturation fields conditioned to observations, are inferred directly, without constructing posterior geomodels. This is accomplished efficiently using a set of O(1000) prior simulation results, data parameterization, and posterior sampling within a Bayesian setting. In this study, we develop and implement (in DSI) a deep-learning-based parameterization to represent spatio-temporal pressure and CO2 saturation fields at a set of time steps. The new parameterization uses an adversarial autoencoder (AAE) for dimension reduction and a convolutional long short-term memory (convLSTM) network to represent the spatial distribution and temporal evolution of the pressure and saturation fields. This parameterization is used with an ensemble smoother with multiple data assimilation (ESMDA) in the DSI framework to enable posterior predictions. A realistic 3D system characterized by prior geological realizations drawn from a range of geological scenarios is considered. A local grid refinement procedure is introduced to estimate the error covariance term that appears in the history matching formulation. Extensive history matching results are presented for various quantities, for multiple synthetic true models. Substantial uncertainty reduction in posterior pressure and saturation fields is achieved in all cases. The framework is applied to efficiently provide posterior predictions for a range of error covariance specifications. Such an assessment would be expensive using a model-based approach.
    Maximum Likelihood Estimation of Latent Variable Structural Equation Models: A Neural Network Approach. (arXiv:2309.14073v2 [stat.ML] UPDATED)
    We propose a graphical structure for structural equation models that is stable under marginalization under linearity and Gaussianity assumptions. We show that computing the maximum likelihood estimation of this model is equivalent to training a neural network. We implement a GPU-based algorithm that computes the maximum likelihood estimation of these models.
    Knowledge Distillation Under Ideal Joint Classifier Assumption. (arXiv:2304.11004v2 [cs.LG] UPDATED)
    Knowledge distillation constitutes a potent methodology for condensing substantial neural networks into more compact and efficient counterparts. Within this context, softmax regression representation learning serves as a widely embraced approach, leveraging a pre-established teacher network to guide the learning process of a diminutive student network. Notably, despite the extensive inquiry into the efficacy of softmax regression representation learning, the intricate underpinnings governing the knowledge transfer mechanism remain inadequately elucidated. This study introduces the 'Ideal Joint Classifier Knowledge Distillation' (IJCKD) framework, an overarching paradigm that not only furnishes a lucid and exhaustive comprehension of prevailing knowledge distillation techniques but also establishes a theoretical underpinning for prospective investigations. Employing mathematical methodologies derived from domain adaptation theory, this investigation conducts a comprehensive examination of the error boundary of the student network contingent upon the teacher network. Consequently, our framework facilitates efficient knowledge transference between teacher and student networks, thereby accommodating a diverse spectrum of applications.
    PoseAction: Action Recognition for Patients in the Ward using Deep Learning Approaches. (arXiv:2310.03288v1 [cs.CV])
    Real-time intelligent detection and prediction of subjects' behavior particularly their movements or actions is critical in the ward. This approach offers the advantage of reducing in-hospital care costs and improving the efficiency of healthcare workers, which is especially true for scenarios at night or during peak admission periods. Therefore, in this work, we propose using computer vision (CV) and deep learning (DL) methods for detecting subjects and recognizing their actions. We utilize OpenPose as an accurate subject detector for recognizing the positions of human subjects in the video stream. Additionally, we employ AlphAction's Asynchronous Interaction Aggregation (AIA) network to predict the actions of detected subjects. This integrated model, referred to as PoseAction, is proposed. At the same time, the proposed model is further trained to predict 12 common actions in ward areas, such as staggering, chest pain, and falling down, using medical-related video clips from the NTU RGB+D and NTU RGB+D 120 datasets. The results demonstrate that PoseAction achieves the highest classification mAP of 98.72% (IoU@0.5). Additionally, this study develops an online real-time mode for action recognition, which strongly supports the clinical translation of PoseAction. Furthermore, using OpenPose's function for recognizing face key points, we also implement face blurring, which is a practical solution to address the privacy protection concerns of patients and healthcare workers. Nevertheless, the training data for PoseAction is currently limited, particularly in terms of label diversity. Consequently, the subsequent step involves utilizing a more diverse dataset (including general actions) to train the model's parameters for improved generalization.
    Memoria: Hebbian Memory Architecture for Human-Like Sequential Processing. (arXiv:2310.03052v1 [cs.LG])
    Transformers have demonstrated their success in various domains and tasks. However, Transformers struggle with long input sequences due to their limited capacity. While one solution is to increase input length, endlessly stretching the length is unrealistic. Furthermore, humans selectively remember and use only relevant information from inputs, unlike Transformers which process all raw data from start to end. We introduce Memoria, a general memory network that applies Hebbian theory which is a major theory explaining human memory formulation to enhance long-term dependencies in neural networks. Memoria stores and retrieves information called engram at multiple memory levels of working memory, short-term memory, and long-term memory, using connection weights that change according to Hebb's rule. Through experiments with popular Transformer-based models like BERT and GPT, we present that Memoria significantly improves the ability to consider long-term dependencies in various tasks. Results show that Memoria outperformed existing methodologies in sorting and language modeling, and long text classification.
    DP-SGD for non-decomposable objective functions. (arXiv:2310.03104v1 [cs.LG])
    Unsupervised pre-training is a common step in developing computer vision models and large language models. In this setting, the absence of labels requires the use of similarity-based loss functions, such as contrastive loss, that favor minimizing the distance between similar inputs and maximizing the distance between distinct inputs. As privacy concerns mount, training these models using differential privacy has become more important. However, due to how inputs are generated for these losses, one of their undesirable properties is that their $L_2$ sensitivity can grow with increasing batch size. This property is particularly disadvantageous for differentially private training methods, such as DP-SGD. To overcome this issue, we develop a new DP-SGD variant for similarity based loss functions -- in particular the commonly used contrastive loss -- that manipulates gradients of the objective function in a novel way to obtain a senstivity of the summed gradient that is $O(1)$ for batch size $n$. We test our DP-SGD variant on some preliminary CIFAR-10 pre-training and CIFAR-100 finetuning tasks and show that, in both tasks, our method's performance comes close to that of a non-private model and generally outperforms DP-SGD applied directly to the contrastive loss.
    How Prevalent is Gender Bias in ChatGPT? -- Exploring German and English ChatGPT Responses. (arXiv:2310.03031v1 [cs.CL])
    With the introduction of ChatGPT, OpenAI made large language models (LLM) accessible to users with limited IT expertise. However, users with no background in natural language processing (NLP) might lack a proper understanding of LLMs. Thus the awareness of their inherent limitations, and therefore will take the systems' output at face value. In this paper, we systematically analyse prompts and the generated responses to identify possible problematic issues with a special focus on gender biases, which users need to be aware of when processing the system's output. We explore how ChatGPT reacts in English and German if prompted to answer from a female, male, or neutral perspective. In an in-depth investigation, we examine selected prompts and analyse to what extent responses differ if the system is prompted several times in an identical way. On this basis, we show that ChatGPT is indeed useful for helping non-IT users draft texts for their daily work. However, it is absolutely crucial to thoroughly check the system's responses for biases as well as for syntactic and grammatical mistakes.
    Multi-modal Gaussian Process Variational Autoencoders for Neural and Behavioral Data. (arXiv:2310.03111v1 [cs.LG])
    Characterizing the relationship between neural population activity and behavioral data is a central goal of neuroscience. While latent variable models (LVMs) are successful in describing high-dimensional time-series data, they are typically only designed for a single type of data, making it difficult to identify structure shared across different experimental data modalities. Here, we address this shortcoming by proposing an unsupervised LVM which extracts temporally evolving shared and independent latents for distinct, simultaneously recorded experimental modalities. We do this by combining Gaussian Process Factor Analysis (GPFA), an interpretable LVM for neural spiking data with temporally smooth latent space, with Gaussian Process Variational Autoencoders (GP-VAEs), which similarly use a GP prior to characterize correlations in a latent space, but admit rich expressivity due to a deep neural network mapping to observations. We achieve interpretability in our model by partitioning latent variability into components that are either shared between or independent to each modality. We parameterize the latents of our model in the Fourier domain, and show improved latent identification using this approach over standard GP-VAE methods. We validate our model on simulated multi-modal data consisting of Poisson spike counts and MNIST images that scale and rotate smoothly over time. We show that the multi-modal GP-VAE (MM-GPVAE) is able to not only identify the shared and independent latent structure across modalities accurately, but provides good reconstructions of both images and neural rates on held-out trials. Finally, we demonstrate our framework on two real world multi-modal experimental settings: Drosophila whole-brain calcium imaging alongside tracked limb positions, and Manduca sexta spike train measurements from ten wing muscles as the animal tracks a visual stimulus.
    PDR-CapsNet: an Energy-Efficient Parallel Approach to Dynamic Routing in Capsule Networks. (arXiv:2310.03212v1 [cs.LG])
    Convolutional Neural Networks (CNNs) have produced state-of-the-art results for image classification tasks. However, they are limited in their ability to handle rotational and viewpoint variations due to information loss in max-pooling layers. Capsule Networks (CapsNets) employ a computationally-expensive iterative process referred to as dynamic routing to address these issues. CapsNets, however, often fall short on complex datasets and require more computational resources than CNNs. To overcome these challenges, we introduce the Parallel Dynamic Routing CapsNet (PDR-CapsNet), a deeper and more energy-efficient alternative to CapsNet that offers superior performance, less energy consumption, and lower overfitting rates. By leveraging a parallelization strategy, PDR-CapsNet mitigates the computational complexity of CapsNet and increases throughput, efficiently using hardware resources. As a result, we achieve 83.55\% accuracy while requiring 87.26\% fewer parameters, 32.27\% and 47.40\% fewer MACs, and Flops, achieving 3x faster inference and 7.29J less energy consumption on a 2080Ti GPU with 11GB VRAM compared to CapsNet and for the CIFAR-10 dataset.
    Regret Analysis of Distributed Online Control for LTI Systems with Adversarial Disturbances. (arXiv:2310.03206v1 [math.OC])
    This paper addresses the distributed online control problem over a network of linear time-invariant (LTI) systems (with possibly unknown dynamics) in the presence of adversarial perturbations. There exists a global network cost that is characterized by a time-varying convex function, which evolves in an adversarial manner and is sequentially and partially observed by local agents. The goal of each agent is to generate a control sequence that can compete with the best centralized control policy in hindsight, which has access to the global cost. This problem is formulated as a regret minimization. For the case of known dynamics, we propose a fully distributed disturbance feedback controller that guarantees a regret bound of $O(\sqrt{T}\log T)$, where $T$ is the time horizon. For the unknown dynamics case, we design a distributed explore-then-commit approach, where in the exploration phase all agents jointly learn the system dynamics, and in the learning phase our proposed control algorithm is applied using each agent system estimate. We establish a regret bound of $O(T^{2/3} \text{poly}(\log T))$ for this setting.
    Sharpness-Aware Minimization and the Edge of Stability. (arXiv:2309.12488v3 [cs.LG] UPDATED)
    Recent experiments have shown that, often, when training a neural network with gradient descent (GD) with a step size $\eta$, the operator norm of the Hessian of the loss grows until it approximately reaches $2/\eta$, after which it fluctuates around this value. The quantity $2/\eta$ has been called the "edge of stability" based on consideration of a local quadratic approximation of the loss. We perform a similar calculation to arrive at an "edge of stability" for Sharpness-Aware Minimization (SAM), a variant of GD which has been shown to improve its generalization. Unlike the case for GD, the resulting SAM-edge depends on the norm of the gradient. Using three deep learning training tasks, we see empirically that SAM operates on the edge of stability identified by this analysis.
    Synergistic Fusion of Graph and Transformer Features for Enhanced Molecular Property Prediction. (arXiv:2310.03027v1 [physics.chem-ph])
    Molecular property prediction is a critical task in computational drug discovery. While recent advances in Graph Neural Networks (GNNs) and Transformers have shown to be effective and promising, they face the following limitations: Transformer self-attention does not explicitly consider the underlying molecule structure while GNN feature representation alone is not sufficient to capture granular and hidden interactions and characteristics that distinguish similar molecules. To address these limitations, we propose SYN- FUSION, a novel approach that synergistically combines pre-trained features from GNNs and Transformers. This approach provides a comprehensive molecular representation, capturing both the global molecule structure and the individual atom characteristics. Experimental results on MoleculeNet benchmarks demonstrate superior performance, surpassing previous models in 5 out of 7 classification datasets and 4 out of 6 regression datasets. The performance of SYN-FUSION has been compared with other Graph-Transformer models that have been jointly trained using a combination of transformer and graph features, and it is found that our approach is on par with those models in terms of performance. Extensive analysis of the learned fusion model across aspects such as loss, latent space, and weight distribution further validates the effectiveness of SYN-FUSION. Finally, an ablation study unequivocally demonstrates that the synergy achieved by SYN-FUSION surpasses the performance of its individual model components and their ensemble, offering a substantial improvement in predicting molecular properties.
    Discovering Knowledge-Critical Subnetworks in Pretrained Language Models. (arXiv:2310.03084v1 [cs.CL])
    Pretrained language models (LMs) encode implicit representations of knowledge in their parameters. However, localizing these representations and disentangling them from each other remains an open problem. In this work, we investigate whether pretrained language models contain various knowledge-critical subnetworks: particular sparse computational subgraphs responsible for encoding specific knowledge the model has memorized. We propose a multi-objective differentiable weight masking scheme to discover these subnetworks and show that we can use them to precisely remove specific knowledge from models while minimizing adverse effects on the behavior of the original language model. We demonstrate our method on multiple GPT2 variants, uncovering highly sparse subnetworks (98%+) that are solely responsible for specific collections of relational knowledge. When these subnetworks are removed, the remaining network maintains most of its initial capacity (modeling language and other memorized relational knowledge) but struggles to express the removed knowledge, and suffers performance drops on examples needing this removed knowledge on downstream tasks after finetuning.
    Assessment of Prediction Intervals Using Uncertainty Characteristics Curves. (arXiv:2310.03158v1 [cs.LG])
    Accurate quantification of model uncertainty has long been recognized as a fundamental requirement for trusted AI. In regression tasks, uncertainty is typically quantified using prediction intervals calibrated to an ad-hoc operating point, making evaluation and comparison across different studies relatively difficult. Our work leverages: (1) the concept of operating characteristics curves and (2) the notion of a gain over a null reference, to derive a novel operating point agnostic assessment methodology for prediction intervals. The paper defines the Uncertainty Characteristics Curve and demonstrates its utility in selected scenarios. We argue that the proposed method addresses the current need for comprehensive assessment of prediction intervals and thus represents a valuable addition to the uncertainty quantification toolbox.
    Physics-Informed Neural Networks for Accelerating Power System State Estimation. (arXiv:2310.03088v1 [cs.LG])
    State estimation is the cornerstone of the power system control center since it provides the operating condition of the system in consecutive time intervals. This work investigates the application of physics-informed neural networks (PINNs) for accelerating power systems state estimation in monitoring the operation of power systems. Traditional state estimation techniques often rely on iterative algorithms that can be computationally intensive, particularly for large-scale power systems. In this paper, a novel approach that leverages the inherent physical knowledge of power systems through the integration of PINNs is proposed. By incorporating physical laws as prior knowledge, the proposed method significantly reduces the computational complexity associated with state estimation while maintaining high accuracy. The proposed method achieves up to 11% increase in accuracy, 75% reduction in standard deviation of results, and 30% faster convergence, as demonstrated by comprehensive experiments on the IEEE 14-bus system.
    Creating an Atlas of Normal Tissue for Pruning WSI Patching Through Anomaly Detection. (arXiv:2310.03106v1 [eess.IV])
    Patching gigapixel whole slide images (WSIs) is an important task in computational pathology. Some methods have been proposed to select a subset of patches as WSI representation for downstream tasks. While most of the computational pathology tasks are designed to classify or detect the presence of pathological lesions in each WSI, the confounding role and redundant nature of normal histology in tissue samples are generally overlooked in WSI representations. In this paper, we propose and validate the concept of an "atlas of normal tissue" solely using samples of WSIs obtained from normal tissue biopsies. Such atlases can be employed to eliminate normal fragments of tissue samples and hence increase the representativeness collection of patches. We tested our proposed method by establishing a normal atlas using 107 normal skin WSIs and demonstrated how established indexes and search engines like Yottixel can be improved. We used 553 WSIs of cutaneous squamous cell carcinoma (cSCC) to show the advantage. We also validated our method applied to an external dataset of 451 breast WSIs. The number of selected WSI patches was reduced by 30% to 50% after utilizing the proposed normal atlas while maintaining the same indexing and search performance in leave-one-patinet-out validation for both datasets. We show that the proposed normal atlas shows promise for unsupervised selection of the most representative patches of the abnormal/malignant WSI lesions.
    Large Language Model Cascades with Mixture of Thoughts Representations for Cost-efficient Reasoning. (arXiv:2310.03094v1 [cs.CL])
    Large language models (LLMs) such as GPT-4 have exhibited remarkable performance in a variety of tasks, but this strong performance often comes with the high expense of using paid API services. In this paper, we are motivated to study building an LLM cascade to save the cost of using LLMs, particularly for performing reasoning (e.g., mathematical, causal) tasks. Our cascade pipeline follows the intuition that simpler questions can be addressed by a weaker but more affordable LLM, whereas only the challenging questions necessitate the stronger and more expensive LLM. To realize this decision-making, we consider the "answer consistency" of the weaker LLM as a signal of the question difficulty and propose several methods for the answer sampling and consistency checking, including one leveraging a mixture of two thought representations (i.e., Chain-of-Thought and Program-of-Thought). Through experiments on six reasoning benchmark datasets, with GPT-3.5-turbo and GPT-4 being the weaker and stronger LLMs, respectively, we demonstrate that our proposed LLM cascades can achieve performance comparable to using solely the stronger LLM but require only 40% of its cost.
    QuATON: Quantization Aware Training of Optical Neurons. (arXiv:2310.03049v1 [cs.LG])
    Optical neural architectures (ONAs) use coding elements with optimized physical parameters to perform intelligent measurements. However, fabricating ONAs while maintaining design performances is challenging. Limitations in fabrication techniques often limit the realizable precision of the trained parameters. Physical constraints may also limit the range of values the physical parameters can hold. Thus, ONAs should be trained within the implementable constraints. However, such physics-based constraints reduce the training objective to a constrained optimization problem, making it harder to optimize with existing gradient-based methods. To alleviate these critical issues that degrade performance from simulation to realization we propose a physics-informed quantization-aware training framework. Our approach accounts for the physical constraints during the training process, leading to robust designs. We evaluate our approach on an ONA proposed in the literature, named a diffractive deep neural network (D2NN), for all-optical phase imaging and for classification of phase objects. With extensive experiments on different quantization levels and datasets, we show that our approach leads to ONA designs that are robust to quantization noise.
    Enhancing Accuracy in Deep Learning Using Random Matrix Theory. (arXiv:2310.03165v1 [cs.LG])
    In this study, we explore the applications of random matrix theory (RMT) in the training of deep neural networks (DNNs), focusing on layer pruning to simplify DNN architecture and loss landscape. RMT, recently used to address overfitting in deep learning, enables the examination of DNN's weight layer spectra. We use these techniques to optimally determine the number of singular values to be removed from the weight layers of a DNN during training via singular value decomposition (SVD). This process aids in DNN simplification and accuracy enhancement, as evidenced by training simple DNN models on the MNIST and Fashion MNIST datasets. Our method can be applied to any fully connected or convolutional layer of a pretrained DNN, decreasing the layer's parameters and simplifying the DNN architecture while preserving or even enhancing the model's accuracy. By discarding small singular values based on RMT criteria, the accuracy of the test set remains consistent, facilitating more efficient DNN training without compromising performance. We provide both theoretical and empirical evidence supporting our claim that the elimination of small singular values based on RMT does not negatively impact the DNN's accuracy. Our results offer valuable insights into the practical application of RMT for the creation of more efficient and accurate deep-learning models.
    The Cadenza ICASSP 2024 Grand Challenge. (arXiv:2310.03480v1 [eess.AS])
    The Cadenza project aims to enhance the audio quality of music for individuals with hearing loss. As part of this, the project is organizing the ICASSP SP Cadenza Challenge: Music Demixing/Remixing for Hearing Aids. The challenge can be tackled by decomposing the music at the hearing aid microphones into vocals, bass, drums, and other components. These can then be intelligently remixed in a personalized manner to improve audio quality. Alternatively, an end-to-end approach could be used. Processes need to consider the music itself, the gain applied to each component, and the listener's hearing loss. The submitted entries will be evaluated using the intrusive objective metric, the Hearing Aid Audio Quality Index (HAAQI). This paper outlines the challenge.
    Safe Exploration in Reinforcement Learning: A Generalized Formulation and Algorithms. (arXiv:2310.03225v1 [cs.LG])
    Safe exploration is essential for the practical use of reinforcement learning (RL) in many real-world scenarios. In this paper, we present a generalized safe exploration (GSE) problem as a unified formulation of common safe exploration problems. We then propose a solution of the GSE problem in the form of a meta-algorithm for safe exploration, MASE, which combines an unconstrained RL algorithm with an uncertainty quantifier to guarantee safety in the current episode while properly penalizing unsafe explorations before actual safety violation to discourage them in future episodes. The advantage of MASE is that we can optimize a policy while guaranteeing with a high probability that no safety constraint will be violated under proper assumptions. Specifically, we present two variants of MASE with different constructions of the uncertainty quantifier: one based on generalized linear models with theoretical guarantees of safety and near-optimality, and another that combines a Gaussian process to ensure safety with a deep RL algorithm to maximize the reward. Finally, we demonstrate that our proposed algorithm achieves better performance than state-of-the-art algorithms on grid-world and Safety Gym benchmarks without violating any safety constraints, even during training.
    TacoGFN: Target Conditioned GFlowNet for Structure-Based Drug Design. (arXiv:2310.03223v1 [cs.LG])
    We seek to automate the generation of drug-like compounds conditioned to specific protein pocket targets. Most current methods approximate the protein-molecule distribution of a finite dataset and, therefore struggle to generate molecules with significant binding improvement over the training dataset. We instead frame the pocket-conditioned molecular generation task as an RL problem and develop TacoGFN, a target conditional Generative Flow Network model. Our method is explicitly encouraged to generate molecules with desired properties as opposed to fitting on a pre-existing data distribution. To this end, we develop transformer-based docking score prediction to speed up docking score computation and propose TacoGFN to explore molecule space efficiently. Furthermore, we incorporate several rounds of active learning where generated samples are queried using a docking oracle to improve the docking score prediction. This approach allows us to accurately explore as much of the molecule landscape as we can afford computationally. Empirically, molecules generated using TacoGFN and its variants significantly outperform all baseline methods across every property (Docking score, QED, SA, Lipinski), while being orders of magnitude faster.
    Formal and Practical Elements for the Certification of Machine Learning Systems. (arXiv:2310.03217v1 [cs.LG])
    Over the past decade, machine learning has demonstrated impressive results, often surpassing human capabilities in sensing tasks relevant to autonomous flight. Unlike traditional aerospace software, the parameters of machine learning models are not hand-coded nor derived from physics but learned from data. They are automatically adjusted during a training phase, and their values do not usually correspond to physical requirements. As a result, requirements cannot be directly traced to lines of code, hindering the current bottom-up aerospace certification paradigm. This paper attempts to address this gap by 1) demystifying the inner workings and processes to build machine learning models, 2) formally establishing theoretical guarantees given by those processes, and 3) complementing these formal elements with practical considerations to develop a complete certification argument for safety-critical machine learning systems. Based on a scalable statistical verifier, our proposed framework is model-agnostic and tool-independent, making it adaptable to many use cases in the industry. We demonstrate results on a widespread application in autonomous flight: vision-based landing.
    Fragment-based Pretraining and Finetuning on Molecular Graphs. (arXiv:2310.03274v1 [cs.LG])
    Property prediction on molecular graphs is an important application of Graph Neural Networks (GNNs). Recently, unlabeled molecular data has become abundant, which facilitates the rapid development of self-supervised learning for GNNs in the chemical domain. In this work, we propose pretraining GNNs at the fragment level, which serves as a promising middle ground to overcome the limitations of node-level and graph-level pretraining. Borrowing techniques from recent work on principle subgraph mining, we obtain a compact vocabulary of prevalent fragments that span a large pretraining dataset. From the extracted vocabulary, we introduce several fragment-based contrastive and predictive pretraining tasks. The contrastive learning task jointly pretrains two different GNNs: one based on molecular graphs and one based on fragment graphs, which represents high-order connectivity within molecules. By enforcing the consistency between the fragment embedding and the aggregated embedding of the corresponding atoms from the molecular graphs, we ensure that both embeddings capture structural information at multiple resolutions. The structural information of the fragment graphs is further exploited to extract auxiliary labels for the graph-level predictive pretraining. We employ both the pretrained molecular-based and fragment-based GNNs for downstream prediction, thus utilizing the fragment information during finetuning. Our models advance the performances on 5 out of 8 common molecular benchmarks and improve the performances on long-range biological benchmarks by at least 11.5%.
    Joint Group Invariant Functions on Data-Parameter Domain Induce Universal Neural Networks. (arXiv:2310.03530v1 [cs.LG])
    The symmetry and geometry of input data are considered to be encoded in the internal data representation inside the neural network, but the specific encoding rule has been less investigated. By focusing on a joint group invariant function on the data-parameter domain, we present a systematic rule to find a dual group action on the parameter domain from a group action on the data domain. Further, we introduce generalized neural networks induced from the joint invariant functions, and present a new group theoretic proof of their universality theorems by using Schur's lemma. Since traditional universality theorems were demonstrated based on functional analytical methods, this study sheds light on the group theoretic aspect of the approximation theory, connecting geometric deep learning to abstract harmonic analysis.
    GPT-MolBERTa: GPT Molecular Features Language Model for molecular property prediction. (arXiv:2310.03030v1 [physics.chem-ph])
    With the emergence of Transformer architectures and their powerful understanding of textual data, a new horizon has opened up to predict the molecular properties based on text description. While SMILES are the most common form of representation, they are lacking robustness, rich information and canonicity, which limit their effectiveness in becoming generalizable representations. Here, we present GPT-MolBERTa, a self-supervised large language model (LLM) which uses detailed textual descriptions of molecules to predict their properties. A text based description of 326000 molecules were collected using ChatGPT and used to train LLM to learn the representation of molecules. To predict the properties for the downstream tasks, both BERT and RoBERTa models were used in the finetuning stage. Experiments show that GPT-MolBERTa performs well on various molecule property benchmarks, and approaching state of the art performance in regression tasks. Additionally, further analysis of the attention mechanisms show that GPT-MolBERTa is able to pick up important information from the input textual data, displaying the interpretability of the model.
    A Deep Reinforcement Learning Approach for Interactive Search with Sentence-level Feedback. (arXiv:2310.03043v1 [cs.LG])
    Interactive search can provide a better experience by incorporating interaction feedback from the users. This can significantly improve search accuracy as it helps avoid irrelevant information and captures the users' search intents. Existing state-of-the-art (SOTA) systems use reinforcement learning (RL) models to incorporate the interactions but focus on item-level feedback, ignoring the fine-grained information found in sentence-level feedback. Yet such feedback requires extensive RL action space exploration and large amounts of annotated data. This work addresses these challenges by proposing a new deep Q-learning (DQ) approach, DQrank. DQrank adapts BERT-based models, the SOTA in natural language processing, to select crucial sentences based on users' engagement and rank the items to obtain more satisfactory responses. We also propose two mechanisms to better explore optimal actions. DQrank further utilizes the experience replay mechanism in DQ to store the feedback sentences to obtain a better initial ranking performance. We validate the effectiveness of DQrank on three search datasets. The results show that DQRank performs at least 12% better than the previous SOTA RL approaches. We also conduct detailed ablation studies. The ablation results demonstrate that each model component can efficiently extract and accumulate long-term engagement effects from the users' sentence-level feedback. This structure offers new technologies with promised performance to construct a search system with sentence-level interaction.
    Differentiable Chemical Physics by Geometric Deep Learning for Gradient-based Property Optimization of Mixtures. (arXiv:2310.03047v1 [physics.chem-ph])
    Chemical mixtures, satisfying multi-objective performance metrics and constraints, enable their use in chemical processes and electrochemical devices. In this work, we develop a differentiable chemical-physics framework for modeling chemical mixtures, DiffMix, where geometric deep learning (GDL) is leveraged to map from molecular species, compositions and environment conditions, to physical coefficients in the mixture physics laws. In particular, we extend mixture thermodynamic and transport laws by creating learnable physical coefficients, where we use graph neural networks as the molecule encoder and enforce component-wise permutation-invariance. We start our model evaluations with thermodynamics of binary mixtures, and further benchmarked multicomponent electrolyte mixtures on their transport properties, in order to test the model generalizability. We show improved prediction accuracy and model robustness of DiffMix than its purely data-driven variants. Furthermore, we demonstrate the efficient optimization of electrolyte transport properties, built on the gradient obtained using DiffMix auto-differentiation. Our simulation runs are then backed up by the data generated by a robotic experimentation setup, Clio. By combining mixture physics and GDL, DiffMix expands the predictive modeling methods for chemical mixtures and provides low-cost optimization approaches in large chemical spaces.
    Modified LAB Algorithm with Clustering-based Search Space Reduction Method for solving Engineering Design Problems. (arXiv:2310.03055v1 [cs.LG])
    A modified LAB algorithm is introduced in this paper. It builds upon the original LAB algorithm (Reddy et al. 2023), which is a socio-inspired algorithm that models competitive and learning behaviours within a group, establishing hierarchical roles. The proposed algorithm incorporates the roulette wheel approach and a reduction factor introducing inter-group competition and iteratively narrowing down the sample space. The algorithm is validated by solving the benchmark test problems from CEC 2005 and CEC 2017. The solutions are validated using standard statistical tests such as two-sided and pairwise signed rank Wilcoxon test and Friedman rank test. The algorithm exhibited improved and superior robustness as well as search space exploration capabilities. Furthermore, a Clustering-Based Search Space Reduction (C-SSR) method is proposed, making the algorithm capable to solve constrained problems. The C-SSR method enables the algorithm to identify clusters of feasible regions, satisfying the constraints and contributing to achieve the optimal solution. This method demonstrates its effectiveness as a potential alternative to traditional constraint handling techniques. The results obtained using the Modified LAB algorithm are then compared with those achieved by other recent metaheuristic algorithms.
    Test Case Recommendations with Distributed Representation of Code Syntactic Features. (arXiv:2310.03174v1 [cs.LG])
    Frequent modifications of unit test cases are inevitable due to software's continuous underlying changes in source code, design, and requirements. Since manually maintaining software test suites is tedious, timely, and costly, automating the process of generation and maintenance of test units will significantly impact the effectiveness and efficiency of software testing processes. To this end, we propose an automated approach which exploits both structural and semantic properties of source code methods and test cases to recommend the most relevant and useful unit tests to the developers. The proposed approach initially trains a neural network to transform method-level source code, as well as unit tests, into distributed representations (embedded vectors) while preserving the importance of the structure in the code. Retrieving the semantic and structural properties of a given method, the approach computes cosine similarity between the method's embedding and the previously-embedded training instances. Further, according to the similarity scores between the embedding vectors, the model identifies the closest methods of embedding and the associated unit tests as the most similar recommendations. The results on the Methods2Test dataset showed that, while there is no guarantee to have similar relevant test cases for the group of similar methods, the proposed approach extracts the most similar existing test cases for a given method in the dataset, and evaluations show that recommended test cases decrease the developers' effort to generating expected test cases.
    Know2BIO: A Comprehensive Dual-View Benchmark for Evolving Biomedical Knowledge Graphs. (arXiv:2310.03221v1 [cs.LG])
    Knowledge graphs (KGs) have emerged as a powerful framework for representing and integrating complex biomedical information. However, assembling KGs from diverse sources remains a significant challenge in several aspects, including entity alignment, scalability, and the need for continuous updates to keep pace with scientific advancements. Moreover, the representative power of KGs is often limited by the scarcity of multi-modal data integration. To overcome these challenges, we propose Know2BIO, a general-purpose heterogeneous KG benchmark for the biomedical domain. Know2BIO integrates data from 30 diverse sources, capturing intricate relationships across 11 biomedical categories. It currently consists of ~219,000 nodes and ~6,200,000 edges. Know2BIO is capable of user-directed automated updating to reflect the latest knowledge in biomedical science. Furthermore, Know2BIO is accompanied by multi-modal data: node features including text descriptions, protein and compound sequences and structures, enabling the utilization of emerging natural language processing methods and multi-modal data integration strategies. We evaluate KG representation models on Know2BIO, demonstrating its effectiveness as a benchmark for KG representation learning in the biomedical field. Data and source code of Know2BIO are available at https://github.com/Yijia-Xiao/Know2BIO/.
    Point-PEFT: Parameter-Efficient Fine-Tuning for 3D Pre-trained Models. (arXiv:2310.03059v1 [cs.CV])
    The popularity of pre-trained large models has revolutionized downstream tasks across diverse fields, such as language, vision, and multi-modality. To minimize the adaption cost for downstream tasks, many Parameter-Efficient Fine-Tuning (PEFT) techniques are proposed for language and 2D image pre-trained models. However, the specialized PEFT method for 3D pre-trained models is still under-explored. To this end, we introduce Point-PEFT, a novel framework for adapting point cloud pre-trained models with minimal learnable parameters. Specifically, for a pre-trained 3D model, we freeze most of its parameters, and only tune the newly added PEFT modules on downstream tasks, which consist of a Point-prior Prompt and a Geometry-aware Adapter. The Point-prior Prompt adopts a set of learnable prompt tokens, for which we propose to construct a memory bank with domain-specific knowledge, and utilize a parameter-free attention to enhance the prompt tokens. The Geometry-aware Adapter aims to aggregate point cloud features within spatial neighborhoods to capture fine-grained geometric information through local interactions. Extensive experiments indicate that our Point-PEFT can achieve better performance than the full fine-tuning on various downstream tasks, while using only 5% of the trainable parameters, demonstrating the efficiency and effectiveness of our approach. Code will be released at https://github.com/EvenJoker/Point-PEFT.
    Batch-less stochastic gradient descent for compressive learning of deep regularization for image denoising. (arXiv:2310.03085v1 [cs.LG])
    We consider the problem of denoising with the help of prior information taken from a database of clean signals or images. Denoising with variational methods is very efficient if a regularizer well adapted to the nature of the data is available. Thanks to the maximum a posteriori Bayesian framework, such regularizer can be systematically linked with the distribution of the data. With deep neural networks (DNN), complex distributions can be recovered from a large training database.To reduce the computational burden of this task, we adapt the compressive learning framework to the learning of regularizers parametrized by DNN. We propose two variants of stochastic gradient descent (SGD) for the recovery of deep regularization parameters from a heavily compressed database. These algorithms outperform the initially proposed method that was limited to low-dimensional signals, each iteration using information from the whole database. They also benefit from classical SGD convergence guarantees. Thanks to these improvements we show that this method can be applied for patch based image denoising.}
    Crossed-IoT device portability of Electromagnetic Side Channel Analysis: Challenges and Dataset. (arXiv:2310.03119v1 [cs.LG])
    IoT (Internet of Things) refers to the network of interconnected physical devices, vehicles, home appliances, and other items embedded with sensors, software, and connectivity, enabling them to collect and exchange data. IoT Forensics is collecting and analyzing digital evidence from IoT devices to investigate cybercrimes, security breaches, and other malicious activities that may have taken place on these connected devices. In particular, EM-SCA has become an essential tool for IoT forensics due to its ability to reveal confidential information about the internal workings of IoT devices without interfering these devices or wiretapping their networks. However, the accuracy and reliability of EM-SCA results can be limited by device variability, environmental factors, and data collection and processing methods. Besides, there is very few research on these limitations that affects significantly the accuracy of EM-SCA approaches for the crossed-IoT device portability as well as limited research on the possible solutions to address such challenge. Therefore, this empirical study examines the impact of device variability on the accuracy and reliability of EM-SCA approaches, in particular machine-learning (ML) based approaches for EM-SCA. We firstly presents the background, basic concepts and techniques used to evaluate the limitations of current EM-SCA approaches and datasets. Our study then addresses one of the most important limitation, which is caused by the multi-core architecture of the processors (SoC). We present an approach to collect the EM-SCA datasets and demonstrate the feasibility of using transfer learning to obtain more meaningful and reliable results from EM-SCA in IoT forensics of crossed-IoT devices. Our study moreover contributes a new dataset for using deep learning models in analysing Electromagnetic Side-Channel data with regards to the cross-device portability matter.
    BID-NeRF: RGB-D image pose estimation with inverted Neural Radiance Fields. (arXiv:2310.03563v1 [cs.CV])
    We aim to improve the Inverted Neural Radiance Fields (iNeRF) algorithm which defines the image pose estimation problem as a NeRF based iterative linear optimization. NeRFs are novel neural space representation models that can synthesize photorealistic novel views of real-world scenes or objects. Our contributions are as follows: we extend the localization optimization objective with a depth-based loss function, we introduce a multi-image based loss function where a sequence of images with known relative poses are used without increasing the computational complexity, we omit hierarchical sampling during volumetric rendering, meaning only the coarse model is used for pose estimation, and we how that by extending the sampling interval convergence can be achieved even or higher initial pose estimate errors. With the proposed modifications the convergence speed is significantly improved, and the basin of convergence is substantially extended.
    Learning Concept-Based Visual Causal Transition and Symbolic Reasoning for Visual Planning. (arXiv:2310.03325v1 [cs.AI])
    Visual planning simulates how humans make decisions to achieve desired goals in the form of searching for visual causal transitions between an initial visual state and a final visual goal state. It has become increasingly important in egocentric vision with its advantages in guiding agents to perform daily tasks in complex environments. In this paper, we propose an interpretable and generalizable visual planning framework consisting of i) a novel Substitution-based Concept Learner (SCL) that abstracts visual inputs into disentangled concept representations, ii) symbol abstraction and reasoning that performs task planning via the self-learned symbols, and iii) a Visual Causal Transition model (ViCT) that grounds visual causal transitions to semantically similar real-world actions. Given an initial state, we perform goal-conditioned visual planning with a symbolic reasoning method fueled by the learned representations and causal transitions to reach the goal state. To verify the effectiveness of the proposed model, we collect a large-scale visual planning dataset based on AI2-THOR, dubbed as CCTP. Extensive experiments on this challenging dataset demonstrate the superior performance of our method in visual task planning. Empirically, we show that our framework can generalize to unseen task trajectories and unseen object categories.
    LESSON: Learning to Integrate Exploration Strategies for Reinforcement Learning via an Option Framework. (arXiv:2310.03342v1 [cs.LG])
    In this paper, a unified framework for exploration in reinforcement learning (RL) is proposed based on an option-critic model. The proposed framework learns to integrate a set of diverse exploration strategies so that the agent can adaptively select the most effective exploration strategy over time to realize a relevant exploration-exploitation trade-off for each given task. The effectiveness of the proposed exploration framework is demonstrated by various experiments in the MiniGrid and Atari environments.
    Posterior Sampling Based on Gradient Flows of the MMD with Negative Distance Kernel. (arXiv:2310.03054v1 [stat.ML])
    We propose conditional flows of the maximum mean discrepancy (MMD) with the negative distance kernel for posterior sampling and conditional generative modeling. This MMD, which is also known as energy distance, has several advantageous properties like efficient computation via slicing and sorting. We approximate the joint distribution of the ground truth and the observations using discrete Wasserstein gradient flows and establish an error bound for the posterior distributions. Further, we prove that our particle flow is indeed a Wasserstein gradient flow of an appropriate functional. The power of our method is demonstrated by numerical examples including conditional image generation and inverse problems like superresolution, inpainting and computed tomography in low-dose and limited-angle settings.
    Fine-tune Language Models to Approximate Unbiased In-context Learning. (arXiv:2310.03331v1 [cs.LG])
    In-context learning (ICL) is an astonishing emergent ability of large language models (LLMs). By presenting a prompt that includes multiple input-output pairs as examples and introducing a new query input, models can generate the corresponding output. However, the performance of models heavily relies on the quality of the input prompt when implementing in-context learning. Biased or imbalanced input prompts can significantly degrade the performance of language models. To address this issue, we introduce a reweighted algorithm called RICL (Reweighted In-context Learning). This algorithm fine-tunes language models using an unbiased validation set to determine the optimal weight for each input-output example to approximate unbiased in-context learning. Furthermore, we also introduce a low-cost reweighted algorithm, a linear optimal weight approximation algorithm called LARICL (Linear Approximation of Reweighted In-context Learning). This algorithm requires minimal training cost while providing effective results. We prove the convergence of our algorithm and validate its performance through experiments conducted on a numerical dataset. The experimental findings reveal a substantial improvement in comparison to benchmarks including the performance of casual prompt-based in-context learning and the performance of a classic fine-tuning method.
    The Blame Problem in Evaluating Local Explanations, and How to Tackle it. (arXiv:2310.03466v1 [cs.LG])
    The number of local model-agnostic explanation techniques proposed has grown rapidly recently. One main reason is that the bar for developing new explainability techniques is low due to the lack of optimal evaluation measures. Without rigorous measures, it is hard to have concrete evidence of whether the new explanation techniques can significantly outperform their predecessors. Our study proposes a new taxonomy for evaluating local explanations: robustness, evaluation using ground truth from synthetic datasets and interpretable models, model randomization, and human-grounded evaluation. Using this proposed taxonomy, we highlight that all categories of evaluation methods, except those based on the ground truth from interpretable models, suffer from a problem we call the "blame problem." In our study, we argue that this category of evaluation measure is a more reasonable method for evaluating local model-agnostic explanations. However, we show that even this category of evaluation measures has further limitations. The evaluation of local explanations remains an open research problem.
    Certifiably Robust Graph Contrastive Learning. (arXiv:2310.03312v1 [cs.CR])
    Graph Contrastive Learning (GCL) has emerged as a popular unsupervised graph representation learning method. However, it has been shown that GCL is vulnerable to adversarial attacks on both the graph structure and node attributes. Although empirical approaches have been proposed to enhance the robustness of GCL, the certifiable robustness of GCL is still remain unexplored. In this paper, we develop the first certifiably robust framework in GCL. Specifically, we first propose a unified criteria to evaluate and certify the robustness of GCL. We then introduce a novel technique, RES (Randomized Edgedrop Smoothing), to ensure certifiable robustness for any GCL model, and this certified robustness can be provably preserved in downstream tasks. Furthermore, an effective training method is proposed for robust GCL. Extensive experiments on real-world datasets demonstrate the effectiveness of our proposed method in providing effective certifiable robustness and enhancing the robustness of any GCL model. The source code of RES is available at https://github.com/ventr1c/RES-GCL.
    SmoothLLM: Defending Large Language Models Against Jailbreaking Attacks. (arXiv:2310.03684v1 [cs.LG])
    Despite efforts to align large language models (LLMs) with human values, widely-used LLMs such as GPT, Llama, Claude, and PaLM are susceptible to jailbreaking attacks, wherein an adversary fools a targeted LLM into generating objectionable content. To address this vulnerability, we propose SmoothLLM, the first algorithm designed to mitigate jailbreaking attacks on LLMs. Based on our finding that adversarially-generated prompts are brittle to character-level changes, our defense first randomly perturbs multiple copies of a given input prompt, and then aggregates the corresponding predictions to detect adversarial inputs. SmoothLLM reduces the attack success rate on numerous popular LLMs to below one percentage point, avoids unnecessary conservatism, and admits provable guarantees on attack mitigation. Moreover, our defense uses exponentially fewer queries than existing attacks and is compatible with any LLM.
    BTDNet: a Multi-Modal Approach for Brain Tumor Radiogenomic Classification. (arXiv:2310.03485v1 [eess.IV])
    Brain tumors pose significant health challenges worldwide, with glioblastoma being one of the most aggressive forms. Accurate determination of the O6-methylguanine-DNA methyltransferase (MGMT) promoter methylation status is crucial for personalized treatment strategies. However, traditional methods are labor-intensive and time-consuming. This paper proposes a novel multi-modal approach, BTDNet, leveraging multi-parametric MRI scans, including FLAIR, T1w, T1wCE, and T2 3D volumes, to predict MGMT promoter methylation status. BTDNet addresses two main challenges: the variable volume lengths (i.e., each volume consists of a different number of slices) and the volume-level annotations (i.e., the whole 3D volume is annotated and not the independent slices that it consists of). BTDNet consists of four components: i) the data augmentation one (that performs geometric transformations, convex combinations of data pairs and test-time data augmentation); ii) the 3D analysis one (that performs global analysis through a CNN-RNN); iii) the routing one (that contains a mask layer that handles variable input feature lengths), and iv) the modality fusion one (that effectively enhances data representation, reduces ambiguities and mitigates data scarcity). The proposed method outperforms by large margins the state-of-the-art methods in the RSNA-ASNR-MICCAI BraTS 2021 Challenge, offering a promising avenue for enhancing brain tumor diagnosis and treatment.
    Otago Exercises Monitoring for Older Adults by a Single IMU and Hierarchical Machine Learning Models. (arXiv:2310.03512v1 [cs.LG])
    Otago Exercise Program (OEP) is a rehabilitation program for older adults to improve frailty, sarcopenia, and balance. Accurate monitoring of patient involvement in OEP is challenging, as self-reports (diaries) are often unreliable. With the development of wearable sensors, Human Activity Recognition (HAR) systems using wearable sensors have revolutionized healthcare. However, their usage for OEP still shows limited performance. The objective of this study is to build an unobtrusive and accurate system to monitor OEP for older adults. Data was collected from older adults wearing a single waist-mounted Inertial Measurement Unit (IMU). Two datasets were collected, one in a laboratory setting, and one at the homes of the patients. A hierarchical system is proposed with two stages: 1) using a deep learning model to recognize whether the patients are performing OEP or activities of daily life (ADLs) using a 10-minute sliding window; 2) based on stage 1, using a 6-second sliding window to recognize the OEP sub-classes performed. The results showed that in stage 1, OEP could be recognized with window-wise f1-scores over 0.95 and Intersection-over-Union (IoU) f1-scores over 0.85 for both datasets. In stage 2, for the home scenario, four activities could be recognized with f1-scores over 0.8: ankle plantarflexors, abdominal muscles, knee bends, and sit-to-stand. The results showed the potential of monitoring the compliance of OEP using a single IMU in daily life. Also, some OEP sub-classes are possible to be recognized for further analysis.
    Variational Inference for GARCH-family Models. (arXiv:2310.03435v1 [stat.ML])
    The Bayesian estimation of GARCH-family models has been typically addressed through Monte Carlo sampling. Variational Inference is gaining popularity and attention as a robust approach for Bayesian inference in complex machine learning models; however, its adoption in econometrics and finance is limited. This paper discusses the extent to which Variational Inference constitutes a reliable and feasible alternative to Monte Carlo sampling for Bayesian inference in GARCH-like models. Through a large-scale experiment involving the constituents of the S&P 500 index, several Variational Inference optimizers, a variety of volatility models, and a case study, we show that Variational Inference is an attractive, remarkably well-calibrated, and competitive method for Bayesian learning.
    Explaining Emergent In-Context Learning as Kernel Regression. (arXiv:2305.12766v2 [cs.CL] UPDATED)
    Large language models (LLMs) have initiated a paradigm shift in transfer learning. In contrast to the classic pretraining-then-finetuning procedure, in order to use LLMs for downstream prediction tasks, one only needs to provide a few demonstrations, known as in-context examples, without adding more or updating existing model parameters. This in-context learning (ICL) capability of LLMs is intriguing, and it is not yet fully understood how pretrained LLMs acquire such capabilities. In this paper, we investigate the reason why a transformer-based language model can accomplish in-context learning after pre-training on a general language corpus by proposing one hypothesis that LLMs can simulate kernel regression with internal representations when faced with in-context examples. More concretely, we first prove that Bayesian inference on in-context prompts can be asymptotically understood as kernel regression $\hat y = \sum_i y_i K(x, x_i)/\sum_i K(x, x_i)$ as the number of in-context demonstrations grows. Then, we empirically investigate the in-context behaviors of language models. We find that during ICL, the attention and hidden features in LLMs match the behaviors of a kernel regression. Finally, our theory provides insights into multiple phenomena observed in the ICL field: why retrieving demonstrative samples similar to test samples can help, why ICL performance is sensitive to the output formats, and why ICL accuracy benefits from selecting in-distribution and representative samples.
    Deep Variational Multivariate Information Bottleneck -- A Framework for Variational Losses. (arXiv:2310.03311v1 [cs.LG])
    Variational dimensionality reduction methods are known for their high accuracy, generative abilities, and robustness. These methods have many theoretical justifications. Here we introduce a unifying principle rooted in information theory to rederive and generalize existing variational methods and design new ones. We base our framework on an interpretation of the multivariate information bottleneck, in which two Bayesian networks are traded off against one another. We interpret the first network as an encoder graph, which specifies what information to keep when compressing the data. We interpret the second network as a decoder graph, which specifies a generative model for the data. Using this framework, we rederive existing dimensionality reduction methods such as the deep variational information bottleneck (DVIB), beta variational auto-encoders (beta-VAE), and deep variational canonical correlation analysis (DVCCA). The framework naturally introduces a trade-off parameter between compression and reconstruction in the DVCCA family of algorithms, resulting in the new beta-DVCCA family. In addition, we derive a new variational dimensionality reduction method, deep variational symmetric informational bottleneck (DVSIB), which simultaneously compresses two variables to preserve information between their compressed representations. We implement all of these algorithms and evaluate their ability to produce shared low dimensional latent spaces on a modified noisy MNIST dataset. We show that algorithms that are better matched to the structure of the data (beta-DVCCA and DVSIB) produce better latent spaces as measured by classification accuracy and the dimensionality of the latent variables. We believe that this framework can be used to unify other multi-view representation learning algorithms. Additionally, it provides a straightforward framework for deriving problem-specific loss functions.
    Memory Capacity of Recurrent Neural Networks with Matrix Representation. (arXiv:2104.07454v3 [cs.LG] UPDATED)
    It is well known that canonical recurrent neural networks (RNNs) face limitations in learning long-term dependencies which have been addressed by memory structures in long short-term memory (LSTM) networks. Neural Turing machines (NTMs) are novel RNNs that implement the notion of programmable computers with neural network controllers that can learn simple algorithmic tasks. Matrix neural networks feature matrix representation which inherently preserves the spatial structure of data when compared to canonical neural networks that use vector-based representation. One may then argue that neural networks with matrix representations may have the potential to provide better memory capacity. In this paper, we define and study a probabilistic notion of memory capacity based on Fisher information for matrix-based RNNs. We find bounds on memory capacity for such networks under various hypotheses and compare them with their vector counterparts. In particular, we show that the memory capacity of such networks is bounded by $N^2$ for $N\times N$ state matrix which generalizes the one known for vector networks. We also show and analyze the increase in memory capacity for such networks which is introduced when one exhibits an external state memory, such as NTMs. Consequently, we construct NTMs with RNN controllers with matrix-based representation of external memory, leading us to introduce Matrix NTMs. We demonstrate the performance of this class of memory networks under certain algorithmic learning tasks such as copying and recall and compare it with Matrix RNNs. We find an improvement in the performance of Matrix NTMs by the addition of external memory, in comparison to Matrix RNNs.
    Distributional PAC-Learning from Nisan's Natural Proofs. (arXiv:2310.03641v1 [cs.CC])
    (Abridged) Carmosino et al. (2016) demonstrated that natural proofs of circuit lower bounds for \Lambda imply efficient algorithms for learning \Lambda-circuits, but only over the uniform distribution, with membership queries, and provided \AC^0[p] \subseteq \Lambda. We consider whether this implication can be generalized to \Lambda \not\supseteq \AC^0[p], and to learning algorithms in Valiant's PAC model, which use only random examples and learn over arbitrary example distributions. We give results of both positive and negative flavor. On the negative side, we observe that if, for every circuit class \Lambda, the implication from natural proofs for \Lambda to learning \Lambda-circuits in Valiant's PAC model holds, then there is a polynomial time solution to O(n^{1.5})-uSVP (unique Shortest Vector Problem), and polynomial time quantum solutions to O(n^{1.5})-SVP (Shortest Vector Problem) and O(n^{1.5})-SIVP (Shortest Independent Vector Problem). This indicates that whether natural proofs for \Lambda imply efficient learning algorithms for \Lambda in Valiant's PAC model may depend on \Lambda. On the positive side, our main result is that specific natural proofs arising from a type of communication complexity argument (e.g., Nisan (1993), for depth-2 majority circuits) imply PAC-learning algorithms in a new distributional variant of Valiant's model. Our distributional PAC model is stronger than the average-case prediction model of Blum et al (1993) and the heuristic PAC model of Nanashima (2021), and has several important properties which make it of independent interest, such as being boosting-friendly. The main applications of our result are new distributional PAC-learning algorithms for depth-2 majority circuits, polytopes and DNFs over natural target distributions, as well as the nonexistence of encoded-input weak PRFs that can be evaluated by depth-2 majority circuits.
    Over-the-Air Federated Learning with Compressed Sensing: Is Sparsification Necessary?. (arXiv:2310.03410v1 [cs.IT])
    Over-the-Air (OtA) Federated Learning (FL) refers to an FL system where multiple agents apply OtA computation for transmitting model updates to a common edge server. Two important features of OtA computation, namely linear processing and signal-level superposition, motivate the use of linear compression with compressed sensing (CS) methods to reduce the number of data samples transmitted over the channel. The previous works on applying CS methods in OtA FL have primarily assumed that the original model update vectors are sparse, or they have been sparsified before compression. However, it is unclear whether linear compression with CS-based reconstruction is more effective than directly sending the non-zero elements in the sparsified update vectors, under the same total power constraint. In this study, we examine and compare several communication designs with or without sparsification. Our findings demonstrate that sparsification before compression is not necessary. Alternatively, sparsification without linear compression can also achieve better performance than the commonly considered setup that combines both.
    FASER: Binary Code Similarity Search through the use of Intermediate Representations. (arXiv:2310.03605v1 [cs.CR])
    Being able to identify functions of interest in cross-architecture software is useful whether you are analysing for malware, securing the software supply chain or conducting vulnerability research. Cross-Architecture Binary Code Similarity Search has been explored in numerous studies and has used a wide range of different data sources to achieve its goals. The data sources typically used draw on common structures derived from binaries such as function control flow graphs or binary level call graphs, the output of the disassembly process or the outputs of a dynamic analysis approach. One data source which has received less attention is binary intermediate representations. Binary Intermediate representations possess two interesting properties: they are cross architecture by their very nature and encode the semantics of a function explicitly to support downstream usage. Within this paper we propose Function as a String Encoded Representation (FASER) which combines long document transformers with the use of intermediate representations to create a model capable of cross architecture function search without the need for manual feature engineering, pre-training or a dynamic analysis step. We compare our approach against a series of baseline approaches for two tasks; A general function search task and a targeted vulnerability search task. Our approach demonstrates strong performance across both tasks, performing better than all baseline approaches.
    LightSeq: Sequence Level Parallelism for Distributed Training of Long Context Transformers. (arXiv:2310.03294v1 [cs.LG])
    Increasing the context length of large language models (LLMs) unlocks fundamentally new capabilities, but also significantly increases the memory footprints of training. Previous model-parallel systems such as Megatron-LM partition and compute different attention heads in parallel, resulting in large communication volumes, so they cannot scale beyond the number of attention heads, thereby hindering its adoption. In this paper, we introduce a new approach, LightSeq, for long-context LLMs training. LightSeq has many notable advantages. First, LightSeq partitions over the sequence dimension, hence is agnostic to model architectures and readily applicable for models with varying numbers of attention heads, such as Multi-Head, Multi-Query and Grouped-Query attention. Second, LightSeq not only requires up to 4.7x less communication than Megatron-LM on popular LLMs but also overlaps the communication with computation. To further reduce the training time, LightSeq features a novel gradient checkpointing scheme to bypass an forward computation for memory-efficient attention. We evaluate LightSeq on Llama-7B and its variants with sequence lengths from 32K to 512K. Through comprehensive experiments on single and cross-node training, we show that LightSeq achieves up to 1.24-2.01x end-to-end speedup, and a 2-8x longer sequence length on models with fewer heads, compared to Megatron-LM. Codes will be available at https://github.com/RulinShao/LightSeq.
    Neural architecture impact on identifying temporally extended Reinforcement Learning tasks. (arXiv:2310.03161v1 [cs.LG])
    Inspired by recent developments in attention models for image classification and natural language processing, we present various Attention based architectures in reinforcement learning (RL) domain, capable of performing well on OpenAI Gym Atari-2600 game suite. In spite of the recent success of Deep Reinforcement learning techniques in various fields like robotics, gaming and healthcare, they suffer from a major drawback that neural networks are difficult to interpret. We try to get around this problem with the help of Attention based models. In Attention based models, extracting and overlaying of attention map onto images allows for direct observation of information used by agent to select actions and easier interpretation of logic behind the chosen actions. Our models in addition to playing well on gym-Atari environments, also provide insights on how agent perceives its environment. In addition, motivated by recent developments in attention based video-classification models using Vision Transformer, we come up with an architecture based on Vision Transformer, for image-based RL domain too. Compared to previous works in Vision Transformer, our model is faster to train and requires fewer computational resources. 3
    SAF: Smart Aggregation Framework for Revealing Atoms Importance Rank and Improving Prediction Rates in Drug Discovery. (arXiv:2310.03028v1 [physics.chem-ph])
    Machine learning, and representation learning in particular, has the potential to facilitate drug discovery by screening a large chemical space in silico. A successful approach for representing molecules is to treat them as a graph and utilize graph neural networks. One of the key limitations of such methods is the necessity to represent compounds with different numbers of atoms, which requires aggregating the atom's information. Common aggregation operators, such as averaging, result in loss of information at the atom level. In this work, we propose a novel aggregating approach where each atom is weighted non-linearly using the Boltzmann distribution with a hyperparameter analogous to temperature. We show that using this weighted aggregation improves the ability of the gold standard message-passing neural network to predict antibiotic activity. Moreover, by changing the temperature hyperparameter, our approach can reveal the atoms that are important for activity prediction in a smooth and consistent way, thus providing a novel, regulated attention mechanism for graph neural networks. We further validate our method by showing that it recapitulates the functional group in beta-Lactam antibiotics. The ability of our approach to rank the atoms' importance for a desired function can be used within any graph neural network to provide interpretability of the results and predictions at the node level.
    Non-Smooth Weakly-Convex Finite-sum Coupled Compositional Optimization. (arXiv:2310.03234v1 [math.OC])
    This paper investigates new families of compositional optimization problems, called $\underline{\bf n}$on-$\underline{\bf s}$mooth $\underline{\bf w}$eakly-$\underline{\bf c}$onvex $\underline{\bf f}$inite-sum $\underline{\bf c}$oupled $\underline{\bf c}$ompositional $\underline{\bf o}$ptimization (NSWC FCCO). There has been a growing interest in FCCO due to its wide-ranging applications in machine learning and AI, as well as its ability to address the shortcomings of stochastic algorithms based on empirical risk minimization. However, current research on FCCO presumes that both the inner and outer functions are smooth, limiting their potential to tackle a more diverse set of problems. Our research expands on this area by examining non-smooth weakly-convex FCCO, where the outer function is weakly convex and non-decreasing, and the inner function is weakly-convex. We analyze a single-loop algorithm and establish its complexity for finding an $\epsilon$-stationary point of the Moreau envelop of the objective function. Additionally, we also extend the algorithm to solving novel non-smooth weakly-convex tri-level finite-sum coupled compositional optimization problems, which feature a nested arrangement of three functions. Lastly, we explore the applications of our algorithms in deep learning for two-way partial AUC maximization and multi-instance two-way partial AUC maximization, using empirical studies to showcase the effectiveness of the proposed algorithms.
    Graph-enhanced Optimizers for Structure-aware Recommendation Embedding Evolution. (arXiv:2310.03032v1 [cs.IR])
    Embedding plays a critical role in modern recommender systems because they are virtual representations of real-world entities and the foundation for subsequent decision models. In this paper, we propose a novel embedding update mechanism, Structure-aware Embedding Evolution (SEvo for short), to encourage related nodes to evolve similarly at each step. Unlike GNN (Graph Neural Network) that typically serves as an intermediate part, SEvo is able to directly inject the graph structure information into embedding with negligible computational overhead in training. The convergence properties of SEvo as well as its possible variants are theoretically analyzed to justify the validity of the designs. Moreover, SEvo can be seamlessly integrated into existing optimizers for state-of-the-art performance. In particular, SEvo-enhanced AdamW with moment estimate correction demonstrates consistent improvements across a spectrum of models and datasets, suggesting a novel technical route to effectively utilize graph structure information beyond explicit GNN modules.
    Efficient Federated Prompt Tuning for Black-box Large Pre-trained Models. (arXiv:2310.03123v1 [cs.LG])
    With the blowout development of pre-trained models (PTMs), the efficient tuning of these models for diverse downstream applications has emerged as a pivotal research concern. Although recent investigations into prompt tuning have provided promising avenues, three salient challenges persist: (1) memory constraint: the continuous growth in the size of open-source PTMs renders fine-tuning, even a fraction of their parameters, challenging for many practitioners. (2) model privacy: existing PTMs often function as public API services, with their parameters inaccessible for effective or tailored fine-tuning. (3) data privacy: the fine-tuning of PTMs necessitates high-quality datasets, which are typically localized and not shared to public. To optimally harness each local dataset while navigating memory constraints and preserving privacy, we propose Federated Black-Box Prompt Tuning (Fed-BBPT). This innovative approach eschews reliance on parameter architectures and private dataset access, instead capitalizing on a central server that aids local users in collaboratively training a prompt generator through regular aggregation. Local users leverage API-driven learning via a zero-order optimizer, obviating the need for PTM deployment. Relative to extensive fine-tuning, Fed-BBPT proficiently sidesteps memory challenges tied to PTM storage and fine-tuning on local machines, tapping into comprehensive, high-quality, yet private training datasets. A thorough evaluation across 40 datasets spanning CV and NLP tasks underscores the robustness of our proposed model.
    Federated Fine-Tuning of LLMs on the Very Edge: The Good, the Bad, the Ugly. (arXiv:2310.03150v1 [cs.LG])
    Large Language Models (LLM) and foundation models are popular as they offer new opportunities for individuals and businesses to improve natural language processing, interact with data, and retrieve information faster. However, training or fine-tuning LLMs requires a vast amount of data, which can be challenging to access due to legal or technical restrictions and may require private computing resources. Federated Learning (FL) is a solution designed to overcome these challenges and expand data access for deep learning applications. This paper takes a hardware-centric approach to explore how LLMs can be brought to modern edge computing systems. Our study fine-tunes the FLAN-T5 model family, ranging from 80M to 3B parameters, using FL for a text summarization task. We provide a micro-level hardware benchmark, compare the model FLOP utilization to a state-of-the-art data center GPU, and study the network utilization in realistic conditions. Our contribution is twofold: First, we evaluate the current capabilities of edge computing systems and their potential for LLM FL workloads. Second, by comparing these systems with a data-center GPU, we demonstrate the potential for improvement and the next steps toward achieving greater computational efficiency at the edge.
    Digital Ethics in Federated Learning. (arXiv:2310.03178v1 [cs.LG])
    The Internet of Things (IoT) consistently generates vast amounts of data, sparking increasing concern over the protection of data privacy and the limitation of data misuse. Federated learning (FL) facilitates collaborative capabilities among multiple parties by sharing machine learning (ML) model parameters instead of raw user data, and it has recently gained significant attention for its potential in privacy preservation and learning efficiency enhancement. In this paper, we highlight the digital ethics concerns that arise when human-centric devices serve as clients in FL. More specifically, challenges of game dynamics, fairness, incentive, and continuity arise in FL due to differences in perspectives and objectives between clients and the server. We analyze these challenges and their solutions from the perspectives of both the client and the server, and through the viewpoints of centralized and decentralized FL. Finally, we explore the opportunities in FL for human-centric IoT as directions for future development.
    UniPredict: Large Language Models are Universal Tabular Predictors. (arXiv:2310.03266v1 [cs.LG])
    Tabular data prediction is a fundamental machine learning task for many applications. Existing methods predominantly employ discriminative modeling and operate under the assumption of a fixed target column, necessitating re-training for every new predictive task. Inspired by the generative power of large language models (LLMs), this paper exploits the idea of building universal tabular data predictors based on generative modeling, namely UniPredict. Here, we show that scaling up an LLM to extensive tabular datasets with the capability of comprehending diverse tabular inputs and predicting for target variables following the input instructions. Specifically, we train a single LLM on an aggregation of 169 tabular datasets with diverse targets and compare its performance against baselines that are trained on each dataset separately. We observe this versatile UniPredict model demonstrates an advantage over other models, ranging from 5.4% to 13.4%, when compared with the best tree-boosting baseline and the best neural network baseline, respectively. We further test UniPredict in few-shot learning settings on another 62 tabular datasets. Our method achieves strong performance in quickly adapting to new tasks, where our method outperforms XGBoost over 100% on the low-resource setup and shows a significant margin over all baselines. We envision that UniPredict sheds light on developing a universal tabular data prediction system that learns from data at scale and serves a wide range of prediction tasks.
    OpenMM 8: Molecular Dynamics Simulation with Machine Learning Potentials. (arXiv:2310.03121v1 [physics.chem-ph])
    Machine learning plays an important and growing role in molecular simulation. The newest version of the OpenMM molecular dynamics toolkit introduces new features to support the use of machine learning potentials. Arbitrary PyTorch models can be added to a simulation and used to compute forces and energy. A higher-level interface allows users to easily model their molecules of interest with general purpose, pretrained potential functions. A collection of optimized CUDA kernels and custom PyTorch operations greatly improves the speed of simulations. We demonstrate these features on simulations of cyclin-dependent kinase 8 (CDK8) and the green fluorescent protein (GFP) chromophore in water. Taken together, these features make it practical to use machine learning to improve the accuracy of simulations at only a modest increase in cost.
    Multi-Task Learning For Reduced Popularity Bias In Multi-Territory Video Recommendations. (arXiv:2310.03148v1 [cs.IR])
    Various data imbalances that naturally arise in a multi-territory personalized recommender system can lead to a significant item bias for globally prevalent items. A locally popular item can be overshadowed by a globally prevalent item. Moreover, users' viewership patterns/statistics can drastically change from one geographic location to another which may suggest to learn specific user embeddings. In this paper, we propose a multi-task learning (MTL) technique, along with an adaptive upsampling method to reduce popularity bias in multi-territory recommendations. Our proposed framework is designed to enrich training examples with active users representation through upsampling, and capable of learning geographic-based user embeddings by leveraging MTL. Through experiments, we demonstrate the effectiveness of our framework in multiple territories compared to a baseline not incorporating our proposed techniques.~Noticeably, we show improved relative gain of up to $65.27\%$ in PR-AUC metric. A case study is presented to demonstrate the advantages of our methods in attenuating the popularity bias of global items.
    Deep Learning in Computational Biology: Advancements, Challenges, and Future Outlook. (arXiv:2310.03086v1 [cs.LG])
    Deep learning has become a powerful tool in computational biology, revolutionising the analysis and interpretation of biological data over time. In our article review, we delve into various aspects of deep learning in computational biology. Specifically, we examine its history, advantages, and challenges. Our focus is on two primary applications: DNA sequence classification and prediction, as well as protein structure prediction from sequence data. Additionally, we provide insights into the outlook for this field. To fully harness the potential of deep learning in computational biology, it is crucial to address the challenges that come with it. These challenges include the requirement for large, labelled datasets and the interpretability of deep learning models. The use of deep learning in the analysis of DNA sequences has brought about a significant transformation in the detection of genomic variants and the analysis of gene expression. This has greatly contributed to the advancement of personalised medicine and drug discovery. Convolutional neural networks (CNNs) have been shown to be highly accurate in predicting genetic variations and gene expression levels. Deep learning techniques are used for analysing epigenetic data, including DNA methylation and histone modifications. This provides valuable insights into metabolic conditions and gene regulation. The field of protein structure prediction has been significantly impacted by deep learning, which has enabled accurate determination of the three-dimensional shape of proteins and prediction of their interactions. The future of deep learning in computational biology looks promising. With the development of advanced deep learning models and interpretation techniques, there is potential to overcome current challenges and further our understanding of biological systems.
    Deep reinforcement learning for machine scheduling: Methodology, the state-of-the-art, and future directions. (arXiv:2310.03195v1 [cs.LG])
    Machine scheduling aims to optimize job assignments to machines while adhering to manufacturing rules and job specifications. This optimization leads to reduced operational costs, improved customer demand fulfillment, and enhanced production efficiency. However, machine scheduling remains a challenging combinatorial problem due to its NP-hard nature. Deep Reinforcement Learning (DRL), a key component of artificial general intelligence, has shown promise in various domains like gaming and robotics. Researchers have explored applying DRL to machine scheduling problems since 1995. This paper offers a comprehensive review and comparison of DRL-based approaches, highlighting their methodology, applications, advantages, and limitations. It categorizes these approaches based on computational components: conventional neural networks, encoder-decoder architectures, graph neural networks, and metaheuristic algorithms. Our review concludes that DRL-based methods outperform exact solvers, heuristics, and tabular reinforcement learning algorithms in terms of computation speed and generating near-global optimal solutions. These DRL-based approaches have been successfully applied to static and dynamic scheduling across diverse machine environments and job characteristics. However, DRL-based schedulers face limitations in handling complex operational constraints, configurable multi-objective optimization, generalization, scalability, interpretability, and robustness. Addressing these challenges will be a crucial focus for future research in this field. This paper serves as a valuable resource for researchers to assess the current state of DRL-based machine scheduling and identify research gaps. It also aids experts and practitioners in selecting the appropriate DRL approach for production scheduling.
    Learning Energy-Based Prior Model with Diffusion-Amortized MCMC. (arXiv:2310.03218v1 [cs.LG])
    Latent space Energy-Based Models (EBMs), also known as energy-based priors, have drawn growing interests in the field of generative modeling due to its flexibility in the formulation and strong modeling power of the latent space. However, the common practice of learning latent space EBMs with non-convergent short-run MCMC for prior and posterior sampling is hindering the model from further progress; the degenerate MCMC sampling quality in practice often leads to degraded generation quality and instability in training, especially with highly multi-modal and/or high-dimensional target distributions. To remedy this sampling issue, in this paper we introduce a simple but effective diffusion-based amortization method for long-run MCMC sampling and develop a novel learning algorithm for the latent space EBM based on it. We provide theoretical evidence that the learned amortization of MCMC is a valid long-run MCMC sampler. Experiments on several image modeling benchmark datasets demonstrate the superior performance of our method compared with strong counterparts
    Attributing Learned Concepts in Neural Networks to Training Data. (arXiv:2310.03149v1 [cs.LG])
    By now there is substantial evidence that deep learning models learn certain human-interpretable features as part of their internal representations of data. As having the right (or wrong) concepts is critical to trustworthy machine learning systems, it is natural to ask which inputs from the model's original training set were most important for learning a concept at a given layer. To answer this, we combine data attribution methods with methods for probing the concepts learned by a model. Training network and probe ensembles for two concept datasets on a range of network layers, we use the recently developed TRAK method for large-scale data attribution. We find some evidence for convergence, where removing the 10,000 top attributing images for a concept and retraining the model does not change the location of the concept in the network nor the probing sparsity of the concept. This suggests that rather than being highly dependent on a few specific examples, the features that inform the development of a concept are spread in a more diffuse manner across its exemplars, implying robustness in concept formation.
    Fairness-enhancing mixed effects deep learning improves fairness on in- and out-of-distribution clustered (non-iid) data. (arXiv:2310.03146v1 [cs.LG])
    Traditional deep learning (DL) suffers from two core problems. Firstly, it assumes training samples are independent and identically distributed. However, numerous real-world datasets group samples by shared measurements (e.g., study participants or cells), violating this assumption. In these scenarios, DL can show compromised performance, limited generalization, and interpretability issues, coupled with cluster confounding causing Type 1 and 2 errors. Secondly, models are typically trained for overall accuracy, often neglecting underrepresented groups and introducing biases in crucial areas like loan approvals or determining health insurance rates, such biases can significantly impact one's quality of life. To address both of these challenges simultaneously, we present a mixed effects deep learning (MEDL) framework. MEDL separately quantifies cluster-invariant fixed effects (FE) and cluster-specific random effects (RE) through the introduction of: 1) a cluster adversary which encourages the learning of cluster-invariant FE, 2) a Bayesian neural network which quantifies the RE, and a mixing function combining the FE an RE into a mixed-effect prediction. We marry this MEDL with adversarial debiasing, which promotes equality-of-odds fairness across FE, RE, and ME predictions for fairness-sensitive variables. We evaluated our approach using three datasets: two from census/finance focusing on income classification and one from healthcare predicting hospitalization duration, a regression task. Our framework notably enhances fairness across all sensitive variables-increasing fairness up to 82% for age, 43% for race, 86% for sex, and 27% for marital-status. Besides promoting fairness, our method maintains the robust performance and clarity of MEDL. It's versatile, suitable for various dataset types and tasks, making it broadly applicable. Our GitHub repository houses the implementation.
    Robust and Interpretable Medical Image Classifiers via Concept Bottleneck Models. (arXiv:2310.03182v1 [cs.CV])
    Medical image classification is a critical problem for healthcare, with the potential to alleviate the workload of doctors and facilitate diagnoses of patients. However, two challenges arise when deploying deep learning models to real-world healthcare applications. First, neural models tend to learn spurious correlations instead of desired features, which could fall short when generalizing to new domains (e.g., patients with different ages). Second, these black-box models lack interpretability. When making diagnostic predictions, it is important to understand why a model makes a decision for trustworthy and safety considerations. In this paper, to address these two limitations, we propose a new paradigm to build robust and interpretable medical image classifiers with natural language concepts. Specifically, we first query clinical concepts from GPT-4, then transform latent image features into explicit concepts with a vision-language model. We systematically evaluate our method on eight medical image classification datasets to verify its effectiveness. On challenging datasets with strong confounding factors, our method can mitigate spurious correlations thus substantially outperform standard visual encoders and other baselines. Finally, we show how classification with a small number of concepts brings a level of interpretability for understanding model decisions through case studies in real medical data.
    Context-Based Tweet Engagement Prediction. (arXiv:2310.03147v1 [cs.IR])
    Twitter is currently one of the biggest social media platforms. Its users may share, read, and engage with short posts called tweets. For the ACM Recommender Systems Conference 2020, Twitter published a dataset around 70 GB in size for the annual RecSys Challenge. In 2020, the RecSys Challenge invited participating teams to create models that would predict engagement likelihoods for given user-tweet combinations. The submitted models predicting like, reply, retweet, and quote engagements were evaluated based on two metrics: area under the precision-recall curve (PRAUC) and relative cross-entropy (RCE). In this diploma thesis, we used the RecSys 2020 Challenge dataset and evaluation procedure to investigate how well context alone may be used to predict tweet engagement likelihood. In doing so, we employed the Spark engine on TU Wien's Little Big Data Cluster to create scalable data preprocessing, feature engineering, feature selection, and machine learning pipelines. We manually created just under 200 additional features to describe tweet context. The results indicate that features describing users' prior engagement history and the popularity of hashtags and links in the tweet were the most informative. We also found that factors such as the prediction algorithm, training dataset size, training dataset sampling method, and feature selection significantly affect the results. After comparing the best results of our context-only prediction models with content-only models and with models developed by the Challenge winners, we identified that the context-based models underperformed in terms of the RCE score. This work thus concludes by situating this discrepancy and proposing potential improvements to our implementation, which is shared in a public git repository.
    Untargeted White-box Adversarial Attack with Heuristic Defence Methods in Real-time Deep Learning based Network Intrusion Detection System. (arXiv:2310.03334v1 [cs.LG])
    Network Intrusion Detection System (NIDS) is a key component in securing the computer network from various cyber security threats and network attacks. However, consider an unfortunate situation where the NIDS is itself attacked and vulnerable more specifically, we can say, How to defend the defender?. In Adversarial Machine Learning (AML), the malicious actors aim to fool the Machine Learning (ML) and Deep Learning (DL) models to produce incorrect predictions with intentionally crafted adversarial examples. These adversarial perturbed examples have become the biggest vulnerability of ML and DL based systems and are major obstacles to their adoption in real-time and mission-critical applications such as NIDS. AML is an emerging research domain, and it has become a necessity for the in-depth study of adversarial attacks and their defence strategies to safeguard the computer network from various cyber security threads. In this research work, we aim to cover important aspects related to NIDS, adversarial attacks and its defence mechanism to increase the robustness of the ML and DL based NIDS. We implemented four powerful adversarial attack techniques, namely, Fast Gradient Sign Method (FGSM), Jacobian Saliency Map Attack (JSMA), Projected Gradient Descent (PGD) and Carlini & Wagner (C&W) in NIDS. We analyzed its performance in terms of various performance metrics in detail. Furthermore, the three heuristics defence strategies, i.e., Adversarial Training (AT), Gaussian Data Augmentation (GDA) and High Confidence (HC), are implemented to improve the NIDS robustness under adversarial attack situations. The complete workflow is demonstrated in real-time network with data packet flow. This research work provides the overall background for the researchers interested in AML and its implementation from a computer network security point of view.
    Dual Prompt Tuning for Domain-Aware Federated Learning. (arXiv:2310.03103v1 [cs.LG])
    Federated learning is a distributed machine learning paradigm that allows multiple clients to collaboratively train a shared model with their local data. Nonetheless, conventional federated learning algorithms often struggle to generalize well due to the ubiquitous domain shift across clients. In this work, we consider a challenging yet realistic federated learning scenario where the training data of each client originates from different domains. We address the challenges of domain shift by leveraging the technique of prompt learning, and propose a novel method called Federated Dual Prompt Tuning (Fed-DPT). Specifically, Fed-DPT employs a pre-trained vision-language model and then applies both visual and textual prompt tuning to facilitate domain adaptation over decentralized data. Extensive experiments of Fed-DPT demonstrate its significant effectiveness in domain-aware federated learning. With a pre-trained CLIP model (ViT-Base as image encoder), the proposed Fed-DPT attains 68.4% average accuracy over six domains in the DomainNet dataset, which improves the original CLIP by a large margin of 14.8%.
  • Open

    Banach Space Optimality of Neural Architectures With Multivariate Nonlinearities. (arXiv:2310.03696v1 [stat.ML])
    We investigate the variational optimality (specifically, the Banach space optimality) of a large class of neural architectures with multivariate nonlinearities/activation functions. To that end, we construct a new family of Banach spaces defined via a regularization operator and the $k$-plane transform. We prove a representer theorem that states that the solution sets to learning problems posed over these Banach spaces are completely characterized by neural architectures with multivariate nonlinearities. These optimal architectures have skip connections and are tightly connected to orthogonal weight normalization and multi-index models, both of which have received considerable interest in the neural network community. Our framework is compatible with a number of classical nonlinearities including the rectified linear unit (ReLU) activation function, the norm activation function, and the radial basis functions found in the theory of thin-plate/polyharmonic splines. We also show that the underlying spaces are special instances of reproducing kernel Banach spaces and variation spaces. Our results shed light on the regularity of functions learned by neural networks trained on data, particularly with multivariate nonlinearities, and provide new theoretical motivation for several architectural choices found in practice.
    Stable Training of Probabilistic Models Using the Leave-One-Out Maximum Log-Likelihood Objective. (arXiv:2310.03556v1 [stat.ML])
    Probabilistic modelling of power systems operation and planning processes depends on data-driven methods, which require sufficiently large datasets. When historical data lacks this, it is desired to model the underlying data generation mechanism as a probability distribution to assess the data quality and generate more data, if needed. Kernel density estimation (KDE) based models are popular choices for this task, but they fail to adapt to data regions with varying densities. In this paper, an adaptive KDE model is employed to circumvent this, where each kernel in the model has an individual bandwidth. The leave-one-out maximum log-likelihood (LOO-MLL) criterion is proposed to prevent the singular solutions that the regular MLL criterion gives rise to, and it is proven that LOO-MLL prevents these. Relying on this guaranteed robustness, the model is extended by assigning learnable weights to the kernels. In addition, a modified expectation-maximization algorithm is employed to accelerate the optimization speed reliably. The performance of the proposed method and models are exhibited on two power systems datasets using different statistical tests and by comparison with Gaussian mixture models. Results show that the proposed models have promising performance, in addition to their singularity prevention guarantees.
    On Convergence of Federated Averaging Langevin Dynamics. (arXiv:2112.05120v4 [stat.ML] UPDATED)
    We propose a federated averaging Langevin algorithm (FA-LD) for uncertainty quantification and mean predictions with distributed clients. In particular, we generalize beyond normal posterior distributions and consider a general class of models. We develop theoretical guarantees for FA-LD for strongly log-concave distributions with non-i.i.d data and study how the injected noise and the stochastic-gradient noise, the heterogeneity of data, and the varying learning rates affect the convergence. Such an analysis sheds light on the optimal choice of local updates to minimize communication costs. Important to our approach is that the communication efficiency does not deteriorate with the injected noise in the Langevin algorithms. In addition, we examine in our FA-LD algorithm both independent and correlated noise used over different clients. We observe there is a trade-off between the pairs among communication, accuracy, and data privacy. As local devices may become inactive in federated networks, we also show convergence results based on different averaging schemes where only partial device updates are available. In such a case, we discover an additional bias that does not decay to zero.
    Anytime-valid t-tests and confidence sequences for Gaussian means with unknown variance. (arXiv:2310.03722v1 [math.ST])
    In 1976, Lai constructed a nontrivial confidence sequence for the mean $\mu$ of a Gaussian distribution with unknown variance $\sigma$. Curiously, he employed both an improper (right Haar) mixture over $\sigma$ and an improper (flat) mixture over $\mu$. Here, we elaborate carefully on the details of his construction, which use generalized nonintegrable martingales and an extended Ville's inequality. While this does yield a sequential t-test, it does not yield an ``e-process'' (due to the nonintegrability of his martingale). In this paper, we develop two new e-processes and confidence sequences for the same setting: one is a test martingale in a reduced filtration, while the other is an e-process in the canonical data filtration. These are respectively obtained by swapping Lai's flat mixture for a Gaussian mixture, and swapping the right Haar mixture over $\sigma$ with the maximum likelihood estimate under the null, as done in universal inference. We also analyze the width of resulting confidence sequences, which have a curious dependence on the error probability $\alpha$. Numerical experiments are provided along the way to compare and contrast the various approaches.
    A Probabilistic Graph Coupling View of Dimension Reduction. (arXiv:2201.13053v3 [math.PR] UPDATED)
    Most popular dimension reduction (DR) methods like t-SNE and UMAP are based on minimizing a cost between input and latent pairwise similarities. Though widely used, these approaches lack clear probabilistic foundations to enable a full understanding of their properties and limitations. To that extent, we introduce a unifying statistical framework based on the coupling of hidden graphs using cross entropy. These graphs induce a Markov random field dependency structure among the observations in both input and latent spaces. We show that existing pairwise similarity DR methods can be retrieved from our framework with particular choices of priors for the graphs. Moreover this reveals that these methods suffer from a statistical deficiency that explains poor performances in conserving coarse-grain dependencies. Our model is leveraged and extended to address this issue while new links are drawn with Laplacian eigenmaps and PCA.
    CLEVRER-Humans: Describing Physical and Causal Events the Human Way. (arXiv:2310.03635v1 [cs.AI])
    Building machines that can reason about physical events and their causal relationships is crucial for flexible interaction with the physical world. However, most existing physical and causal reasoning benchmarks are exclusively based on synthetically generated events and synthetic natural language descriptions of causal relationships. This design brings up two issues. First, there is a lack of diversity in both event types and natural language descriptions; second, causal relationships based on manually-defined heuristics are different from human judgments. To address both shortcomings, we present the CLEVRER-Humans benchmark, a video reasoning dataset for causal judgment of physical events with human labels. We employ two techniques to improve data collection efficiency: first, a novel iterative event cloze task to elicit a new representation of events in videos, which we term Causal Event Graphs (CEGs); second, a data augmentation technique based on neural language generative models. We convert the collected CEGs into questions and answers to be consistent with prior work. Finally, we study a collection of baseline approaches for CLEVRER-Humans question-answering, highlighting the great challenges set forth by our benchmark.
    High-dimensional Bayesian Optimization with Group Testing. (arXiv:2310.03515v1 [cs.LG])
    Bayesian optimization is an effective method for optimizing expensive-to-evaluate black-box functions. High-dimensional problems are particularly challenging as the surrogate model of the objective suffers from the curse of dimensionality, which makes accurate modeling difficult. We propose a group testing approach to identify active variables to facilitate efficient optimization in these domains. The proposed algorithm, Group Testing Bayesian Optimization (GTBO), first runs a testing phase where groups of variables are systematically selected and tested on whether they influence the objective. To that end, we extend the well-established theory of group testing to functions of continuous ranges. In the second phase, GTBO guides optimization by placing more importance on the active dimensions. By exploiting the axis-aligned subspace assumption, GTBO is competitive against state-of-the-art methods on several synthetic and real-world high-dimensional optimization tasks. Furthermore, GTBO aids in the discovery of active parameters in applications, thereby enhancing practitioners' understanding of the problem at hand.
    Leveraging Model-based Trees as Interpretable Surrogate Models for Model Distillation. (arXiv:2310.03112v1 [stat.ML])
    Surrogate models play a crucial role in retrospectively interpreting complex and powerful black box machine learning models via model distillation. This paper focuses on using model-based trees as surrogate models which partition the feature space into interpretable regions via decision rules. Within each region, interpretable models based on additive main effects are used to approximate the behavior of the black box model, striking for an optimal balance between interpretability and performance. Four model-based tree algorithms, namely SLIM, GUIDE, MOB, and CTree, are compared regarding their ability to generate such surrogate models. We investigate fidelity, interpretability, stability, and the algorithms' capability to capture interaction effects through appropriate splits. Based on our comprehensive analyses, we finally provide an overview of user-specific recommendations.
    Sharpness-Aware Minimization and the Edge of Stability. (arXiv:2309.12488v3 [cs.LG] UPDATED)
    Recent experiments have shown that, often, when training a neural network with gradient descent (GD) with a step size $\eta$, the operator norm of the Hessian of the loss grows until it approximately reaches $2/\eta$, after which it fluctuates around this value. The quantity $2/\eta$ has been called the "edge of stability" based on consideration of a local quadratic approximation of the loss. We perform a similar calculation to arrive at an "edge of stability" for Sharpness-Aware Minimization (SAM), a variant of GD which has been shown to improve its generalization. Unlike the case for GD, the resulting SAM-edge depends on the norm of the gradient. Using three deep learning training tasks, we see empirically that SAM operates on the edge of stability identified by this analysis.
    Quantitative CLTs in Deep Neural Networks. (arXiv:2307.06092v4 [cs.LG] UPDATED)
    We study the distribution of a fully connected neural network with random Gaussian weights and biases in which the hidden layer widths are proportional to a large constant $n$. Under mild assumptions on the non-linearity, we obtain quantitative bounds on normal approximations valid at large but finite $n$ and any fixed network depth. Our theorems show both for the finite-dimensional distributions and the entire process, that the distance between a random fully connected network (and its derivatives) to the corresponding infinite width Gaussian process scales like $n^{-\gamma}$ for $\gamma>0$, with the exponent depending on the metric used to measure discrepancy. Our bounds are strictly stronger in terms of their dependence on network width than any previously available in the literature; in the one-dimensional case, we also prove that they are optimal, i.e., we establish matching lower bounds.
    Sparse Deep Learning for Time Series Data: Theory and Applications. (arXiv:2310.03243v1 [stat.ML])
    Sparse deep learning has become a popular technique for improving the performance of deep neural networks in areas such as uncertainty quantification, variable selection, and large-scale network compression. However, most existing research has focused on problems where the observations are independent and identically distributed (i.i.d.), and there has been little work on the problems where the observations are dependent, such as time series data and sequential data in natural language processing. This paper aims to address this gap by studying the theory for sparse deep learning with dependent data. We show that sparse recurrent neural networks (RNNs) can be consistently estimated, and their predictions are asymptotically normally distributed under appropriate assumptions, enabling the prediction uncertainty to be correctly quantified. Our numerical results show that sparse deep learning outperforms state-of-the-art methods, such as conformal predictions, in prediction uncertainty quantification for time series data. Furthermore, our results indicate that the proposed method can consistently identify the autoregressive order for time series data and outperform existing methods in large-scale model compression. Our proposed method has important practical implications in fields such as finance, healthcare, and energy, where both accurate point estimates and prediction uncertainty quantification are of concern.
    Interpolating between Clustering and Dimensionality Reduction with Gromov-Wasserstein. (arXiv:2310.03398v1 [cs.LG])
    We present a versatile adaptation of existing dimensionality reduction (DR) objectives, enabling the simultaneous reduction of both sample and feature sizes. Correspondances between input and embedding samples are computed through a semi-relaxed Gromov-Wasserstein optimal transport (OT) problem. When the embedding sample size matches that of the input, our model recovers classical popular DR models. When the embedding's dimensionality is unconstrained, we show that the OT plan delivers a competitive hard clustering. We emphasize the importance of intermediate stages that blend DR and clustering for summarizing real data and apply our method to visualize datasets of images.
    Abstractors and relational cross-attention: An inductive bias for explicit relational reasoning in Transformers. (arXiv:2304.00195v3 [stat.ML] UPDATED)
    An extension of Transformers is proposed that enables explicit relational reasoning through a novel module called the Abstractor. At the core of the Abstractor is a variant of attention called relational cross-attention. The approach is motivated by an architectural inductive bias for relational learning that disentangles relational information from extraneous features about individual objects. This enables explicit relational reasoning, supporting abstraction and generalization from limited data. The Abstractor is first evaluated on simple discriminative relational tasks and compared to existing relational architectures. Next, the Abstractor is evaluated on purely relational sequence-to-sequence tasks, where dramatic improvements are seen in sample efficiency compared to standard Transformers. Finally, Abstractors are evaluated on a collection of tasks based on mathematical problem solving, where modest but consistent improvements in performance and sample efficiency are observed.
    Joint Group Invariant Functions on Data-Parameter Domain Induce Universal Neural Networks. (arXiv:2310.03530v1 [cs.LG])
    The symmetry and geometry of input data are considered to be encoded in the internal data representation inside the neural network, but the specific encoding rule has been less investigated. By focusing on a joint group invariant function on the data-parameter domain, we present a systematic rule to find a dual group action on the parameter domain from a group action on the data domain. Further, we introduce generalized neural networks induced from the joint invariant functions, and present a new group theoretic proof of their universality theorems by using Schur's lemma. Since traditional universality theorems were demonstrated based on functional analytical methods, this study sheds light on the group theoretic aspect of the approximation theory, connecting geometric deep learning to abstract harmonic analysis.
    Non-Asymptotic Analysis of Ensemble Kalman Updates: Effective Dimension and Localization. (arXiv:2208.03246v3 [stat.ML] UPDATED)
    Many modern algorithms for inverse problems and data assimilation rely on ensemble Kalman updates to blend prior predictions with observed data. Ensemble Kalman methods often perform well with a small ensemble size, which is essential in applications where generating each particle is costly. This paper develops a non-asymptotic analysis of ensemble Kalman updates that rigorously explains why a small ensemble size suffices if the prior covariance has moderate effective dimension due to fast spectrum decay or approximate sparsity. We present our theory in a unified framework, comparing several implementations of ensemble Kalman updates that use perturbed observations, square root filtering, and localization. As part of our analysis, we develop new dimension-free covariance estimation bounds for approximately sparse matrices that may be of independent interest.
    SmoothLLM: Defending Large Language Models Against Jailbreaking Attacks. (arXiv:2310.03684v1 [cs.LG])
    Despite efforts to align large language models (LLMs) with human values, widely-used LLMs such as GPT, Llama, Claude, and PaLM are susceptible to jailbreaking attacks, wherein an adversary fools a targeted LLM into generating objectionable content. To address this vulnerability, we propose SmoothLLM, the first algorithm designed to mitigate jailbreaking attacks on LLMs. Based on our finding that adversarially-generated prompts are brittle to character-level changes, our defense first randomly perturbs multiple copies of a given input prompt, and then aggregates the corresponding predictions to detect adversarial inputs. SmoothLLM reduces the attack success rate on numerous popular LLMs to below one percentage point, avoids unnecessary conservatism, and admits provable guarantees on attack mitigation. Moreover, our defense uses exponentially fewer queries than existing attacks and is compatible with any LLM.
    Unpaired Image-to-Image Translation via Neural Schr\"odinger Bridge. (arXiv:2305.15086v2 [cs.CV] UPDATED)
    Diffusion models are a powerful class of generative models which simulate stochastic differential equations (SDEs) to generate data from noise. Although diffusion models have achieved remarkable progress in recent years, they have limitations in the unpaired image-to-image translation tasks due to the Gaussian prior assumption. Schr\"odinger Bridge (SB), which learns an SDE to translate between two arbitrary distributions, have risen as an attractive solution to this problem. However, none of SB models so far have been successful at unpaired translation between high-resolution images. In this work, we propose the Unpaired Neural Schr\"odinger Bridge (UNSB), which expresses SB problem as a sequence of adversarial learning problems. This allows us to incorporate advanced discriminators and regularization to learn a SB between unpaired data. We demonstrate that UNSB is scalable and successfully solves various unpaired image-to-image translation tasks. Code: \url{https://github.com/cyclomon/UNSB}
    Deep Momentum Multi-Marginal Schr\"odinger Bridge. (arXiv:2303.01751v3 [stat.ML] UPDATED)
    It is a crucial challenge to reconstruct population dynamics using unlabeled samples from distributions at coarse time intervals. Recent approaches such as flow-based models or Schr\"odinger Bridge (SB) models have demonstrated appealing performance, yet the inferred sample trajectories either fail to account for the underlying stochasticity or are $\underline{D}$eep $\underline{M}$omentum Multi-Marginal $\underline{S}$chr\"odinger $\underline{B}$ridge(DMSB), a novel computational framework that learns the smooth measure-valued spline for stochastic systems that satisfy position marginal constraints across time. By tailoring the celebrated Bregman Iteration and extending the Iteration Proportional Fitting to phase space, we manage to handle high-dimensional multi-marginal trajectory inference tasks efficiently. Our algorithm outperforms baselines significantly, as evidenced by experiments for synthetic datasets and a real-world single-cell RNA sequence dataset. Additionally, the proposed approach can reasonably reconstruct the evolution of velocity distribution, from position snapshots only, when there is a ground truth velocity that is nevertheless inaccessible.
    Optimal 1-Wasserstein Distance for WGANs. (arXiv:2201.02824v2 [stat.ML] UPDATED)
    The mathematical forces at work behind Generative Adversarial Networks raise challenging theoretical issues. Motivated by the important question of characterizing the geometrical properties of the generated distributions, we provide a thorough analysis of Wasserstein GANs (WGANs) in both the finite sample and asymptotic regimes. We study the specific case where the latent space is univariate and derive results valid regardless of the dimension of the output space. We show in particular that for a fixed sample size, the optimal WGANs are closely linked with connected paths minimizing the sum of the squared Euclidean distances between the sample points. We also highlight the fact that WGANs are able to approach (for the 1-Wasserstein distance) the target distribution as the sample size tends to infinity, at a given convergence rate and provided the family of generative Lipschitz functions grows appropriately. We derive in passing new results on optimal transport theory in the semi-discrete setting.
    Towards Inferential Reproducibility of Machine Learning Research. (arXiv:2302.04054v6 [cs.LG] UPDATED)
    Reliability of machine learning evaluation -- the consistency of observed evaluation scores across replicated model training runs -- is affected by several sources of nondeterminism which can be regarded as measurement noise. Current tendencies to remove noise in order to enforce reproducibility of research results neglect inherent nondeterminism at the implementation level and disregard crucial interaction effects between algorithmic noise factors and data properties. This limits the scope of conclusions that can be drawn from such experiments. Instead of removing noise, we propose to incorporate several sources of variance, including their interaction with data properties, into an analysis of significance and reliability of machine learning evaluation, with the aim to draw inferences beyond particular instances of trained models. We show how to use linear mixed effects models (LMEMs) to analyze performance evaluation scores, and to conduct statistical inference with a generalized likelihood ratio test (GLRT). This allows us to incorporate arbitrary sources of noise like meta-parameter variations into statistical significance testing, and to assess performance differences conditional on data properties. Furthermore, a variance component analysis (VCA) enables the analysis of the contribution of noise sources to overall variance and the computation of a reliability coefficient by the ratio of substantial to total variance.
    Gradient Flows for Sampling: Mean-Field Models, Gaussian Approximations and Affine Invariance. (arXiv:2302.11024v5 [stat.ML] UPDATED)
    Sampling a probability distribution with an unknown normalization constant is a fundamental problem in computational science and engineering. This task may be cast as an optimization problem over all probability measures, and an initial distribution can be evolved to the desired minimizer dynamically via gradient flows. Mean-field models, whose law is governed by the gradient flow in the space of probability measures, may also be identified; particle approximations of these mean-field models form the basis of algorithms. The gradient flow approach is also the basis of algorithms for variational inference, in which the optimization is performed over a parameterized family of probability distributions such as Gaussians, and the underlying gradient flow is restricted to the parameterized family. By choosing different energy functionals and metrics for the gradient flow, different algorithms with different convergence properties arise. In this paper, we concentrate on the Kullback-Leibler divergence after showing that, up to scaling, it has the unique property that the gradient flows resulting from this choice of energy do not depend on the normalization constant. For the metrics, we focus on variants of the Fisher-Rao, Wasserstein, and Stein metrics; we introduce the affine invariance property for gradient flows, and their corresponding mean-field models, determine whether a given metric leads to affine invariance, and modify it to make it affine invariant if it does not. We study the resulting gradient flows in both probability density space and Gaussian space. The flow in the Gaussian space may be understood as a Gaussian approximation of the flow. We demonstrate that the Gaussian approximation based on the metric and through moment closure coincide, establish connections between them, and study their long-time convergence properties showing the advantages of affine invariance.
    Stochastic interpolants with data-dependent couplings. (arXiv:2310.03725v1 [cs.LG])
    Generative models inspired by dynamical transport of measure -- such as flows and diffusions -- construct a continuous-time map between two probability densities. Conventionally, one of these is the target density, only accessible through samples, while the other is taken as a simple base density that is data-agnostic. In this work, using the framework of stochastic interpolants, we formalize how to \textit{couple} the base and the target densities. This enables us to incorporate information about class labels or continuous embeddings to construct dynamical transport maps that serve as conditional generative models. We show that these transport maps can be learned by solving a simple square loss regression problem analogous to the standard independent setting. We demonstrate the usefulness of constructing dependent couplings in practice through experiments in super-resolution and in-painting.
    Maximum Likelihood Estimation of Latent Variable Structural Equation Models: A Neural Network Approach. (arXiv:2309.14073v2 [stat.ML] UPDATED)
    We propose a graphical structure for structural equation models that is stable under marginalization under linearity and Gaussianity assumptions. We show that computing the maximum likelihood estimation of this model is equivalent to training a neural network. We implement a GPU-based algorithm that computes the maximum likelihood estimation of these models.  ( 2 min )
    Assessment of the Reliablity of a Model's Decision by Generalizing Attribution to the Wavelet Domain. (arXiv:2305.14979v3 [cs.CV] UPDATED)
    Neural networks have shown remarkable performance in computer vision, but their deployment in numerous scientific and technical fields is challenging due to their black-box nature. Scientists and practitioners need to evaluate the reliability of a decision, i.e., to know simultaneously if a model relies on the relevant features and whether these features are robust to image corruptions. Existing attribution methods aim to provide human-understandable explanations by highlighting important regions in the image domain, but fail to fully characterize a decision process's reliability. To bridge this gap, we introduce the Wavelet sCale Attribution Method (WCAM), a generalization of attribution from the pixel domain to the space-scale domain using wavelet transforms. Attribution in the wavelet domain reveals where {\it and} on what scales the model focuses, thus enabling us to assess whether a decision is reliable.  ( 3 min )
    Characterization of causal ancestral graphs for time series with latent confounders. (arXiv:2112.08417v2 [stat.ME] UPDATED)
    In this paper, we introduce a novel class of graphical models for representing time lag specific causal relationships and independencies of multivariate time series with unobserved confounders. We completely characterize these graphs and show that they constitute proper subsets of the currently employed model classes. As we show, from the novel graphs one can thus draw stronger causal inferences -- without additional assumptions. We further introduce a graphical representation of Markov equivalence classes of the novel graphs. This graphical representation contains more causal knowledge than what current state-of-the-art causal discovery algorithms learn.  ( 2 min )
    Network Cascade Vulnerability using Constrained Bayesian Optimization. (arXiv:2304.14420v2 [cs.SI] UPDATED)
    Measures of power grid vulnerability are often assessed by the amount of damage an adversary can exact on the network. However, the cascading impact of such attacks is often overlooked, even though cascades are one of the primary causes of large-scale blackouts. This paper explores modifications of transmission line protection settings as candidates for adversarial attacks, which can remain undetectable as long as the network equilibrium state remains unaltered. This forms the basis of a black-box function in a Bayesian optimization procedure, where the objective is to find protection settings that maximize network degradation due to cascading. Notably, our proposed method is agnostic to the choice of the cascade simulator and its underlying assumptions. Numerical experiments reveal that, against conventional wisdom, maximally misconfiguring the protection settings of all network lines does not cause the most cascading. More surprisingly, even when the degree of misconfiguration is limited due to resource constraints, it is still possible to find settings that produce cascades comparable in severity to instances where there are no resource constraints.  ( 2 min )
    A Latent Variable Approach for Non-Hierarchical Multi-Fidelity Adaptive Sampling. (arXiv:2310.03298v1 [stat.ML])
    Multi-fidelity (MF) methods are gaining popularity for enhancing surrogate modeling and design optimization by incorporating data from various low-fidelity (LF) models. While most existing MF methods assume a fixed dataset, adaptive sampling methods that dynamically allocate resources among fidelity models can achieve higher efficiency in the exploring and exploiting the design space. However, most existing MF methods rely on the hierarchical assumption of fidelity levels or fail to capture the intercorrelation between multiple fidelity levels and utilize it to quantify the value of the future samples and navigate the adaptive sampling. To address this hurdle, we propose a framework hinged on a latent embedding for different fidelity models and the associated pre-posterior analysis to explicitly utilize their correlation for adaptive sampling. In this framework, each infill sampling iteration includes two steps: We first identify the location of interest with the greatest potential improvement using the high-fidelity (HF) model, then we search for the next sample across all fidelity levels that maximize the improvement per unit cost at the location identified in the first step. This is made possible by a single Latent Variable Gaussian Process (LVGP) model that maps different fidelity models into an interpretable latent space to capture their correlations without assuming hierarchical fidelity levels. The LVGP enables us to assess how LF sampling candidates will affect HF response with pre-posterior analysis and determine the next sample with the best benefit-to-cost ratio. Through test cases, we demonstrate that the proposed method outperforms the benchmark methods in both MF global fitting (GF) and Bayesian Optimization (BO) problems in convergence rate and robustness. Moreover, the method offers the flexibility to switch between GF and BO by simply changing the acquisition function.  ( 3 min )
    Learning Robust Statistics for Simulation-based Inference under Model Misspecification. (arXiv:2305.15871v3 [stat.ML] UPDATED)
    Simulation-based inference (SBI) methods such as approximate Bayesian computation (ABC), synthetic likelihood, and neural posterior estimation (NPE) rely on simulating statistics to infer parameters of intractable likelihood models. However, such methods are known to yield untrustworthy and misleading inference outcomes under model misspecification, thus hindering their widespread applicability. In this work, we propose the first general approach to handle model misspecification that works across different classes of SBI methods. Leveraging the fact that the choice of statistics determines the degree of misspecification in SBI, we introduce a regularized loss function that penalises those statistics that increase the mismatch between the data and the model. Taking NPE and ABC as use cases, we demonstrate the superior performance of our method on high-dimensional time-series models that are artificially misspecified. We also apply our method to real data from the field of radio propagation where the model is known to be misspecified. We show empirically that the method yields robust inference in misspecified scenarios, whilst still being accurate when the model is well-specified.  ( 2 min )
    Rethinking Fairness for Human-AI Collaboration. (arXiv:2310.03647v1 [cs.LG])
    Existing approaches to algorithmic fairness aim to ensure equitable outcomes if human decision-makers comply perfectly with algorithmic decisions. However, perfect compliance with the algorithm is rarely a reality or even a desirable outcome in human-AI collaboration. Yet, recent studies have shown that selective compliance with fair algorithms can amplify discrimination relative to the prior human policy. As a consequence, ensuring equitable outcomes requires fundamentally different algorithmic design principles that ensure robustness to the decision-maker's (a priori unknown) compliance pattern. We define the notion of compliance-robustly fair algorithmic recommendations that are guaranteed to (weakly) improve fairness in decisions, regardless of the human's compliance pattern. We propose a simple optimization strategy to identify the best performance-improving compliance-robustly fair policy. However, we show that it may be infeasible to design algorithmic recommendations that are simultaneously fair in isolation, compliance-robustly fair, and more accurate than the human policy; thus, if our goal is to improve the equity and accuracy of human-AI collaboration, it may not be desirable to enforce traditional fairness constraints.  ( 2 min )
    Deep Ridgelet Transform: Voice with Koopman Operator Proves Universality of Formal Deep Networks. (arXiv:2310.03529v1 [cs.LG])
    We identify hidden layers inside a DNN with group actions on the data space, and formulate the DNN as a dual voice transform with respect to Koopman operator, a linear representation of the group action. Based on the group theoretic arguments, particularly by using Schur's lemma, we show a simple proof of the universality of those DNNs.  ( 2 min )
    Learning Energy-Based Prior Model with Diffusion-Amortized MCMC. (arXiv:2310.03218v1 [cs.LG])
    Latent space Energy-Based Models (EBMs), also known as energy-based priors, have drawn growing interests in the field of generative modeling due to its flexibility in the formulation and strong modeling power of the latent space. However, the common practice of learning latent space EBMs with non-convergent short-run MCMC for prior and posterior sampling is hindering the model from further progress; the degenerate MCMC sampling quality in practice often leads to degraded generation quality and instability in training, especially with highly multi-modal and/or high-dimensional target distributions. To remedy this sampling issue, in this paper we introduce a simple but effective diffusion-based amortization method for long-run MCMC sampling and develop a novel learning algorithm for the latent space EBM based on it. We provide theoretical evidence that the learned amortization of MCMC is a valid long-run MCMC sampler. Experiments on several image modeling benchmark datasets demonstrate the superior performance of our method compared with strong counterparts  ( 2 min )
    On the Implicit Bias of Adam. (arXiv:2309.00079v3 [cs.LG] UPDATED)
    In previous literature, backward error analysis was used to find ordinary differential equations (ODEs) approximating the gradient descent trajectory. It was found that finite step sizes implicitly regularize solutions because terms appearing in the ODEs penalize the two-norm of the loss gradients. We prove that the existence of similar implicit regularization in RMSProp and Adam depends on their hyperparameters and the training stage, but with a different "norm" involved: the corresponding ODE terms either penalize the (perturbed) one-norm of the loss gradients or, on the contrary, hinder its decrease (the latter case being typical). We also conduct numerical experiments and discuss how the proven facts can influence generalization.  ( 2 min )
    Analysis of learning a flow-based generative model from limited sample complexity. (arXiv:2310.03575v1 [stat.ML])
    We study the problem of training a flow-based generative model, parametrized by a two-layer autoencoder, to sample from a high-dimensional Gaussian mixture. We provide a sharp end-to-end analysis of the problem. First, we provide a tight closed-form characterization of the learnt velocity field, when parametrized by a shallow denoising auto-encoder trained on a finite number $n$ of samples from the target distribution. Building on this analysis, we provide a sharp description of the corresponding generative flow, which pushes the base Gaussian density forward to an approximation of the target density. In particular, we provide closed-form formulae for the distance between the mean of the generated mixture and the mean of the target mixture, which we show decays as $\Theta_n(\frac{1}{n})$. Finally, this rate is shown to be in fact Bayes-optimal.  ( 2 min )
    Towards Optimal Neural Networks: the Role of Sample Splitting in Hyperparameter Selection. (arXiv:2307.07726v2 [stat.ML] UPDATED)
    When artificial neural networks have demonstrated exceptional practical success in a variety of domains, investigations into their theoretical characteristics, such as their approximation power, statistical properties, and generalization performance, have concurrently made significant strides. In this paper, we construct a novel theory for understanding the effectiveness of neural networks, which offers a perspective distinct from prior research. Specifically, we explore the rationale underlying a common practice during the construction of neural network models: sample splitting. Our findings indicate that the optimal hyperparameters derived from sample splitting can enable a neural network model that asymptotically minimizes the prediction risk. We conduct extensive experiments across different application scenarios and network architectures, and the results manifest our theory's effectiveness.  ( 2 min )
    Beyond Stationarity: Convergence Analysis of Stochastic Softmax Policy Gradient Methods. (arXiv:2310.02671v1 [math.OC] CROSS LISTED)
    Markov Decision Processes (MDPs) are a formal framework for modeling and solving sequential decision-making problems. In finite-time horizons such problems are relevant for instance for optimal stopping or specific supply chain problems, but also in the training of large language models. In contrast to infinite horizon MDPs optimal policies are not stationary, policies must be learned for every single epoch. In practice all parameters are often trained simultaneously, ignoring the inherent structure suggested by dynamic programming. This paper introduces a combination of dynamic programming and policy gradient called dynamic policy gradient, where the parameters are trained backwards in time. For the tabular softmax parametrisation we carry out the convergence analysis for simultaneous and dynamic policy gradient towards global optima, both in the exact and sampled gradient settings without regularisation. It turns out that the use of dynamic policy gradient training much better exploits the structure of finite-time problems which is reflected in improved convergence bounds.  ( 2 min )
    Plug-and-Play Posterior Sampling under Mismatched Measurement and Prior Models. (arXiv:2310.03546v1 [stat.ML])
    Posterior sampling has been shown to be a powerful Bayesian approach for solving imaging inverse problems. The recent plug-and-play unadjusted Langevin algorithm (PnP-ULA) has emerged as a promising method for Monte Carlo sampling and minimum mean squared error (MMSE) estimation by combining physical measurement models with deep-learning priors specified using image denoisers. However, the intricate relationship between the sampling distribution of PnP-ULA and the mismatched data-fidelity and denoiser has not been theoretically analyzed. We address this gap by proposing a posterior-L2 pseudometric and using it to quantify an explicit error bound for PnP-ULA under mismatched posterior distribution. We numerically validate our theory on several inverse problems such as sampling from Gaussian mixture models and image deblurring. Our results suggest that the sensitivity of the sampling distribution of PnP-ULA to a mismatch in the measurement model and the denoiser can be precisely characterized.  ( 2 min )
    Molecule Design by Latent Prompt Transformer. (arXiv:2310.03253v1 [cs.LG])
    This paper proposes a latent prompt Transformer model for solving challenging optimization problems such as molecule design, where the goal is to find molecules with optimal values of a target chemical or biological property that can be computed by an existing software. Our proposed model consists of three components. (1) A latent vector whose prior distribution is modeled by a Unet transformation of a Gaussian white noise vector. (2) A molecule generation model that generates the string-based representation of molecule conditional on the latent vector in (1). We adopt the causal Transformer model that takes the latent vector in (1) as prompt. (3) A property prediction model that predicts the value of the target property of a molecule based on a non-linear regression on the latent vector in (1). We call the proposed model the latent prompt Transformer model. After initial training of the model on existing molecules and their property values, we then gradually shift the model distribution towards the region that supports desired values of the target property for the purpose of molecule design. Our experiments show that our proposed model achieves state of the art performances on several benchmark molecule design tasks.  ( 2 min )
    Non-Smooth Weakly-Convex Finite-sum Coupled Compositional Optimization. (arXiv:2310.03234v1 [math.OC])
    This paper investigates new families of compositional optimization problems, called $\underline{\bf n}$on-$\underline{\bf s}$mooth $\underline{\bf w}$eakly-$\underline{\bf c}$onvex $\underline{\bf f}$inite-sum $\underline{\bf c}$oupled $\underline{\bf c}$ompositional $\underline{\bf o}$ptimization (NSWC FCCO). There has been a growing interest in FCCO due to its wide-ranging applications in machine learning and AI, as well as its ability to address the shortcomings of stochastic algorithms based on empirical risk minimization. However, current research on FCCO presumes that both the inner and outer functions are smooth, limiting their potential to tackle a more diverse set of problems. Our research expands on this area by examining non-smooth weakly-convex FCCO, where the outer function is weakly convex and non-decreasing, and the inner function is weakly-convex. We analyze a single-loop algorithm and establish its complexity for finding an $\epsilon$-stationary point of the Moreau envelop of the objective function. Additionally, we also extend the algorithm to solving novel non-smooth weakly-convex tri-level finite-sum coupled compositional optimization problems, which feature a nested arrangement of three functions. Lastly, we explore the applications of our algorithms in deep learning for two-way partial AUC maximization and multi-instance two-way partial AUC maximization, using empirical studies to showcase the effectiveness of the proposed algorithms.  ( 2 min )
    Posterior Sampling Based on Gradient Flows of the MMD with Negative Distance Kernel. (arXiv:2310.03054v1 [stat.ML])
    We propose conditional flows of the maximum mean discrepancy (MMD) with the negative distance kernel for posterior sampling and conditional generative modeling. This MMD, which is also known as energy distance, has several advantageous properties like efficient computation via slicing and sorting. We approximate the joint distribution of the ground truth and the observations using discrete Wasserstein gradient flows and establish an error bound for the posterior distributions. Further, we prove that our particle flow is indeed a Wasserstein gradient flow of an appropriate functional. The power of our method is demonstrated by numerical examples including conditional image generation and inverse problems like superresolution, inpainting and computed tomography in low-dose and limited-angle settings.  ( 2 min )
    Sampling via Gradient Flows in the Space of Probability Measures. (arXiv:2310.03597v1 [stat.ML])
    Sampling a target probability distribution with an unknown normalization constant is a fundamental challenge in computational science and engineering. Recent work shows that algorithms derived by considering gradient flows in the space of probability measures open up new avenues for algorithm development. This paper makes three contributions to this sampling approach by scrutinizing the design components of such gradient flows. Any instantiation of a gradient flow for sampling needs an energy functional and a metric to determine the flow, as well as numerical approximations of the flow to derive algorithms. Our first contribution is to show that the Kullback-Leibler divergence, as an energy functional, has the unique property (among all f-divergences) that gradient flows resulting from it do not depend on the normalization constant of the target distribution. Our second contribution is to study the choice of metric from the perspective of invariance. The Fisher-Rao metric is known as the unique choice (up to scaling) that is diffeomorphism invariant. As a computationally tractable alternative, we introduce a relaxed, affine invariance property for the metrics and gradient flows. In particular, we construct various affine invariant Wasserstein and Stein gradient flows. Affine invariant gradient flows are shown to behave more favorably than their non-affine-invariant counterparts when sampling highly anisotropic distributions, in theory and by using particle methods. Our third contribution is to study, and develop efficient algorithms based on Gaussian approximations of the gradient flows; this leads to an alternative to particle methods. We establish connections between various Gaussian approximate gradient flows, discuss their relation to gradient methods arising from parametric variational inference, and study their convergence properties both theoretically and numerically.  ( 3 min )
    Variational Inference for GARCH-family Models. (arXiv:2310.03435v1 [stat.ML])
    The Bayesian estimation of GARCH-family models has been typically addressed through Monte Carlo sampling. Variational Inference is gaining popularity and attention as a robust approach for Bayesian inference in complex machine learning models; however, its adoption in econometrics and finance is limited. This paper discusses the extent to which Variational Inference constitutes a reliable and feasible alternative to Monte Carlo sampling for Bayesian inference in GARCH-like models. Through a large-scale experiment involving the constituents of the S&P 500 index, several Variational Inference optimizers, a variety of volatility models, and a case study, we show that Variational Inference is an attractive, remarkably well-calibrated, and competitive method for Bayesian learning.  ( 2 min )
    Demystifying Oversmoothing in Attention-Based Graph Neural Networks. (arXiv:2305.16102v2 [cs.LG] UPDATED)
    Oversmoothing in Graph Neural Networks (GNNs) refers to the phenomenon where increasing network depth leads to homogeneous node representations. While previous work has established that Graph Convolutional Networks (GCNs) exponentially lose expressive power, it remains controversial whether the graph attention mechanism can mitigate oversmoothing. In this work, we provide a definitive answer to this question through a rigorous mathematical analysis, by viewing attention-based GNNs as nonlinear time-varying dynamical systems and incorporating tools and techniques from the theory of products of inhomogeneous matrices and the joint spectral radius. We establish that, contrary to popular belief, the graph attention mechanism cannot prevent oversmoothing and loses expressive power exponentially. The proposed framework extends the existing results on oversmoothing for symmetric GCNs to a significantly broader class of GNN models, including random walk GCNs, Graph Attention Networks (GATs) and (graph) transformers. In particular, our analysis accounts for asymmetric, state-dependent and time-varying aggregation operators and a wide range of common nonlinear activation functions, such as ReLU, LeakyReLU, GELU and SiLU.  ( 2 min )

  • Open

    [D] What exactly does base multimodal mean?
    I here a lot of people say that models like flamingo and Idefics aren't really multimodal, that they just use clip models to give text captions to the transformer, that there not "base multimodal" what exactly does it mean? Is there a way to directly tokenize images to transformers? Are there major architectural changes, if so, how would they differ from GPT-2? submitted by /u/vatsadev [link] [comments]
    [R] AutoAgents: A Framework for Automatic Agent Generation - Peking University 2023 - Generates the for the task necessary amount of different Agents that are also able to use tools in their work!
    Paper: https://arxiv.org/abs/2309.17288v1 Github: https://github.com/LinkSoul-AI/AutoAgents Abstract: Large language models (LLMs) have enabled remarkable advances in automated task-solving with multi-agent systems. However, most existing LLM-based multi-agent approaches rely on predefined agents to handle simple tasks, limiting the adaptability of multi-agent collaboration to different scenarios. Therefore, we introduce AutoAgents, an innovative framework that adaptively generates and coordinates multiple specialized agents to build an AI team according to different tasks. Specifically, AutoAgents couples the relationship between tasks and roles by dynamically generating multiple required agents based on task content and planning solutions for the current task based on the generated expert agents. Multiple specialized agents collaborate with each other to efficiently accomplish tasks. Concurrently, an observer role is incorporated into the framework to reflect on the designated plans and agents' responses and improve upon them. Our experiments on various benchmarks demonstrate that AutoAgents generates more coherent and accurate solutions than the existing multi-agent methods. This underscores the significance of assigning different roles to different tasks and of team cooperation, offering new perspectives for tackling complex tasks. https://preview.redd.it/2jmnr73kymsb1.jpg?width=1663&format=pjpg&auto=webp&s=08f53d5da3d12e685c5d4b24f27628d880a917c1 https://preview.redd.it/jklyr73kymsb1.jpg?width=824&format=pjpg&auto=webp&s=6f69b2fc5ef4bda60553da0bb953bd3c07ad506b https://preview.redd.it/elatla3kymsb1.jpg?width=1029&format=pjpg&auto=webp&s=e7e508cedd17b4798c9f90bf1c089beff3042f4a ​ submitted by /u/Singularian2501 [link] [comments]  ( 9 min )
    [Project] LoRA from Scratch
    Hi there! I was interested in learning more about LoRA but I was having a hard time finding a good simple example of implementing LoRA, as most sources are training large models and use a combination of huggingface transformers and the loralib package the original LoRA authors wrote. As a result, I ended up writing a simple LoRA implementation from scratch in pytorch lightning, and I figured other people might find it helpful as a learning resource or springboard: https://github.com/sunildkumar/lora_from_scratch/tree/main submitted by /u/dragseon [link] [comments]  ( 9 min )
    [P] Tutorial: Benchmarking Bark text-to-speech on 26 consumer GPUs - Reading out 144K recipes
    In this project, we benchmarked Bark text-to-speech across 26 different consumer GPUs. The goal: To get Bark to read 144K food recipes from Food.com's recipe dataset. You can read the full tutorial here: https://blog.salad.com/bark-benchmark-text-to-speech/ Included: Architecture diagram, data preparation, inference server setup, queue worker, setting up container group & compiling the results Code-blocks included in the tutorial. Words per dollar for each GPU: https://preview.redd.it/6daqluu3omsb1.png?width=2000&format=png&auto=webp&s=bc4b74fe6ee80c2721ab324eb0d9a2d7c2f7ddb1 Although the latest cards are indeed much faster than older cards at performing the inference, there’s really a sweet spot for cost-performance in the lower end 30xx series cards. Conclusions As is often the case, there’s a clear trade-off here between cost and performance. Higher end cards are faster, but their disproportionate cost makes them more expensive per word spoken. The model’s median speed is surprisingly similar across GPU types, even though the peak performance can be quite different. Salad has a lot of RTX 3060 GPUs available, based on their relatively low speed, yet huge number of inferences performed over the test. No matter what GPU you select, you should be prepared for significant variability in performance. Qualitative: While bark’s speech is often impressively natural sounding, it does have a tendency to go off script sometimes. We’ve also made available audio from 1000 top-rated recipes, paired with the script it was trying to read. submitted by /u/SaladChefs [link] [comments]  ( 9 min )
    [R] Brown University Paper: Low-Resource Languages (Zulu, Scots Gaelic, Hmong, Guarani) Can Easily Jailbreak LLMs
    Researchers from Brown University presented a new study supporting that translating unsafe prompts into `low-resource languages` allows them to easily bypass safety measures in LLMs. By converting English inputs like "how to steal without getting caught" into Zulu and feeding to GPT-4, harmful responses slipped through 80% of the time. English prompts were blocked over 99% of the time, for comparison. The study benchmarked attacks across 12 diverse languages and categories: High-resource: English, Chinese, Arabic, Hindi Mid-resource: Ukrainian, Bengali, Thai, Hebrew Low-resource: Zulu, Scots Gaelic, Hmong, Guarani The low-resource languages showed serious vulnerability to generating harmful responses, with combined attack success rates of around 79%. Mid-resource language success rates were much lower at 22%, while high-resource languages showed minimal vulnerability at around 11% success. Attacks worked as well as state-of-the-art techniques without needing adversarial prompts. These languages are used by 1.2 billion speakers today and allows easy exploitation by translating prompts. The English-centric focus misses vulnerabilities in other languages. TLDR: Bypassing safety in AI chatbots is easy by translating prompts to low-resource languages (like Zulu, Scots Gaelic, Hmong, and Guarani). Shows gaps in multilingual safety training. Full summary Paper is here. submitted by /u/Successful-Western27 [link] [comments]  ( 9 min )
    [D] Textbook prerequisites
    What are the prerequisites to read the book: "probabilistic machine learning an introduction" by Kevin P. Murphy? submitted by /u/OneAdhesiveness2585 [link] [comments]
    [R] Moving Object Based Collision-Free Video Synopsis
    Webpage : Moving Object Based Collision-Free Video Synopsis (IEEE SMC 2018) (anton-jeran.github.io) Paper : Moving Object Based Collision-Free Video Synopsis | IEEE Conference Publication | IEEE Xplore Presentation : [IEEE SMC 2018] Moving Object Based Collision-Free Video Synopsis - YouTube submitted by /u/Snoo63916 [link] [comments]  ( 9 min )
    [P] MusicGen Streaming 🎵
    Faster MusicGen Generation with Streaming There's no need to wait for MusicGen to generate the full audio before you can start listening to the outputs ⏰ With streaming, you can play the audio as soon as the first chunk is ready 🎵 In practice, this reduces the latency to just 5s ⚡️ Check-out the demo: https://huggingface.co/spaces/sanchit-gandhi/musicgen-streaming How Does it Work? MusicGen is an auto-regressive transformer-based model, meaning generates audio codes (tokens) in a causal fashion. At each decoding step, the model generates a new set of audio codes, conditional on the text input and all previous audio codes. From the frame rate of the EnCodec model used to decode the generated codes to audio waveform, each set of generated audio codes corresponds to 0.02 seconds. This me…
    [D]/{R] simple question on generating a confusion matrix for object detection
    i have to generate a confusion matrix for object detection through my own code. if i have predicted Bounding Box A (BB-A) which matches to Ground Truth A (GT-A), and I have another predicted Bounding Box B (BB-B) with a lower score than BB-A, does BB-B count as a true positive/match? or is it considered a false positive given that there has already been a matched BB to GT-A? i.e., with matching bounding boxes for generating a confusion matrix, is it a one-to-one matching? or is it more like match one GT to as many predictions? submitted by /u/Alarmed-Broccoli2536 [link] [comments]
    [P] I'm using Instruct GPT to show anti-clickbait summaries on youtube videos
    submitted by /u/Wise-Astronaut-4047 [link] [comments]  ( 8 min )
    [D] Feature extraction for sets i.e. data of varying size
    Are there classical feature extraction methods that work on sets i.e. data of variable size? I'd like to start with a feature matrix X_in of shape N x f and have some feature mixing to arive at X_out N x h (N=size of set, f=input feature size, h=output feature size). Here, N can vary. For clarity, one set(containing N vectors of size f) is one sample. A dataset consists of many samples(each one being a set of varying size). Then I'd run this through a classical ML model. So, essentially, I'm looking for something like DeepSets or Transformers - can handle data of varying size and is permutation equivariant, but I don't wanna train for long. ​ https://fabianfuchsml.github.io/learningonsets/ submitted by /u/Mundane_Pay1506 [link] [comments]
    [D] How is neural ODEs as a field of study?
    Hi, I'm a 21yr old physics undergrad, and I have zero knowledge in neural networks / machine learning / so on. I have an opportunity to do a research project on neural ODEs, so I want to know more about the field: Is it an emerging field or is it mature and well-researched? What are my career outlooks if I take this project? Thank you. submitted by /u/moorelibqc17412 [link] [comments]  ( 9 min )
    [D] Non-convex functions with exactly one local minimum
    Rosenbrock function is non-convex, but has exactly one local minimum. Is there a specific name for such functions? Are there any theorems about them? Any special optimization algorithms? On the first glance, while being non-convex, they seem to be "easier" to optimize than functions that have multiple local minima, such as Rastrigin function. submitted by /u/Tomarchelone [link] [comments]
    [P] Talk to your Zendesk tickets with Weaviate’s Verba and dlt: A step by step guide
    Hi folks, we played around sticking production pipelines and vector dbs together to enable "talking to your data". We created an example with Zendesk, but it would work with any custom python generator or existing connectors. Project: Talk to your Zendesk tickets with Weaviate’s Verba and dlt: A step by step guide If you are interested to try more ready made connectors, to for example talk with your github or asana data or something else. Who are we? dlt, the open source loading library: https://pypi.org/project/dlt/ Like the demo? Give us a git star Want to discuss? join the dlt slack community submitted by /u/Thinker_Assignment [link] [comments]  ( 9 min )
    [D] EMNLP 2023 decisions thread
    When can we expect to get the decisions? Any idea folks? What can be a good cutoff for main or findings? submitted by /u/Ok_Swan3875 [link] [comments]
    [D] Parallelizing cheaper GPUs(rtx 4090) vs buying A100
    Hi. I am a college student and I am trying to run deep learning models (hopefully LLMs one day) and my laptop keep crashing because of ram issue. So I am going to build a new desktop. I am thinking of buying 2 rtx 4090 and Parallelizing them instead of buying A100 because buying 2 rtx 4090 is half the cost of buying A100. But is there a downside of Parallelizing vs buying a single gpu with large vram? If I am willing to take longer to train a model, can i use 3 rtx 4090 instead of a100 80gb model?? submitted by /u/ColumbiaGSAlum [link] [comments]
    [D] What's the SOTA model in Time Series Long term forecasting?
    I read https://arxiv.org/abs/2205.13504 which compare different transformer models. But now is 2023, I am not sure if any better models appear in this time series. ​ https://preview.redd.it/o6sihjqjrhsb1.png?width=1076&format=png&auto=webp&s=3db7d50590270bac52e7115e1e9903a6785957d2 submitted by /u/Trust_Ok [link] [comments]
    [R] Agent Instructs Large Language Models to be General Zero-Shot Reasoners
    Nicholas Crispino, Kyle Montgomery, Fankun Zeng, Dawn Song, Chenguang Wang Paper: https://arxiv.org/abs/2310.03710 Abstract: We introduce a method to improve the zero-shot reasoning abilities of large language models on general language understanding tasks. Specifically, we build an autonomous agent to instruct the reasoning process of large language models. We show this approach further unleashes the zero-shot reasoning abilities of large language models to more tasks. We study the performance of our method on a wide set of datasets spanning generation, classification, and reasoning. We show that our method generalizes to most tasks and obtains state-of-the-art zero-shot performance on 20 of the 29 datasets that we evaluate. For instance, our method boosts the performance of state-of-the-art large language models by a large margin, including Vicuna-13b (13.3%), Llama-2-70b-chat (23.2%), and GPT-3.5 Turbo (17.0%). Compared to zero-shot chain of thought, our improvement in reasoning is striking, with an average increase of 10.5%. With our method, Llama-2-70b-chat outperforms zero-shot GPT-3.5 Turbo by 10.2%. The code will be available at https://github.com/wang-research-lab/agentinstruct. submitted by /u/ncrispino [link] [comments]  ( 9 min )
    [D] How to compute the distance between two high-dimensional distributions?
    Hey all, I am generating a set of extra MNIST digits for a research project, and I am interested in somehow computing the distance between the distribution these digits represent and the distribution that the MNIST train set, for example, represents. The issue is that it seems like typical methods (Jensen-Shannon, Wasserstein, etc.) collapse at high dimensions. Is there a consensus solid approach to do this nowadays? Thanks! submitted by /u/SignificantSundae793 [link] [comments]  ( 9 min )
  • Open

    What will be the next big AI product for consumers?
    The next big thing in AI products for consumers is likely to be products that are more personalized, intelligent, and integrated into our daily lives. For example, we can expect to see more AI-powered personal assistants that can help us with a wider range of tasks, such as managing our schedules, making travel arrangements, and even providing companionship. We may also see more AI-powered devices in our homes, such as refrigerators that can track our food inventory and suggest recipes, or thermostats that can learn our heating and cooling preferences and adjust themselves accordingly. AI is also poised to revolutionize the way we interact with the world around us. For example, AI-powered translation apps could allow us to communicate with people from all over the world in real time. AI-…
    Big Tech's thirst for AI dominance may bring literal thirst for everyone else
    The increasing dominance of Big Tech in AI may lead to a literal thirst for water for everyone else, as data centers are projected to consume 450 million gallons of water daily by 2030. This poses a significant concern for drought-stricken regions, such as Spain's Talavera de la Reina, where a planned data facility could consume 176 million gallons annually. Data center operators require large amounts of energy, and the lack of transparency in measuring water usage exacerbates the issue. Only 39% of data centers measured their water usage last year, highlighting the need for greater transparency. The demand for computing power is outpacing sustainability efforts, creating a challenge for the industry. Even simple interactions with AI, like a 20-question conversation with ChatGPT, contribute to water consumption. Source : https://thehustle.co/big-tech-s-thirst-for-ai-dominance-may-bring-literal-thirst-for-everyone-else/ submitted by /u/NuseAI [link] [comments]
    From AI annotator to…?
    Hey guys. Been working as an annotator for a fairly well-known AI company and loving it/loving learning about the industry. It primarily uses writing skills but I’m wondering where it could take me in the AI world? Any tips, next steps or suggestions? Any key skills/hard skills you’d recommend? submitted by /u/op3rafish [link] [comments]
    The Rise of AI: How Artificial Intelligence is Impacting the Job Market | "Artificial intelligence is expected to create 97 million new jobs. These new roles could range from AI prompt engineers to machine learning engineers to automation experts and more"
    submitted by /u/Tao_Dragon [link] [comments]
    Remember That Letter Calling for a Pause on AI? It Didn't Work
    Despite a letter signed by 500 technologists and business leaders calling for a pause on AI advancements, AI development has continued to accelerate. Companies like OpenAI, Meta, and Amazon have been actively working on newer models and greater capabilities. Advancements in AI include the integration of ChatGPT-style chatbots and AI image generators into various startups and businesses. The so-called pause on AI was more like a firing gun, with companies pouring resources into the AI tech race. Not only have there been technical advancements, but civil society, content creators, and lawmakers have also responded to the evolving AI landscape. Source : https://gizmodo.com/everything-thats-happened-in-ai-since-open-letter-1850891057 submitted by /u/NuseAI [link] [comments]
    Brown University Paper: Low-Resource Languages (Zulu, Scots Gaelic, Hmong, Guarani) Can Easily Jailbreak LLMs
    Researchers from Brown University presented a new study supporting that translating unsafe prompts into `low-resource languages` allows them to easily bypass safety measures in LLMs. By converting English inputs like "how to steal without getting caught" into Zulu and feeding to GPT-4, harmful responses slipped through 80% of the time. English prompts were blocked over 99% of the time, for comparison. The study benchmarked attacks across 12 diverse languages and categories: High-resource: English, Chinese, Arabic, Hindi Mid-resource: Ukrainian, Bengali, Thai, Hebrew Low-resource: Zulu, Scots Gaelic, Hmong, Guarani The low-resource languages showed serious vulnerability to generating harmful responses, with combined attack success rates of around 79%. Mid-resource language success rates were much lower at 22%, while high-resource languages showed minimal vulnerability at around 11% success. Attacks worked as well as state-of-the-art techniques without needing adversarial prompts. These languages are used by 1.2 billion speakers today and allows easy exploitation by translating prompts. The English-centric focus misses vulnerabilities in other languages. TLDR: Bypassing safety in AI chatbots is easy by translating prompts to low-resource languages (like Zulu, Scots Gaelic, Hmong, and Guarani). Shows gaps in multilingual safety training. Full summary Paper is here. submitted by /u/Successful-Western27 [link] [comments]
    AI — weekly megathread!
    News provided by aibrews.com ​ Google DeepMind introduced 𝗥𝗧-𝗫: a generalist AI model to help advance how robots can learn new skills. To train it, DeepMind together with 33 academic labs developed Open X-Embodiment, a massive open dataset that compiles over 500 skills and 150,000 tasks from 22 robot types. It is the most comprehensive robotics dataset of its kind released to accelerate the development of multi-robot models that could be trained to generalize across platforms, scenes, objects and tasks. [Details]. Researchers from Meta AI present Any-Modality Augmented Language Model (AnyMAL), a unified model that understands multiple inputs (vision, audio, motion sensor signals). When multiple modalities are interleaved and given as input the model reasons over them jointly [Paper…
    What is the most powerful way that artificial intelligence can help people lose weight?
    Artificial Intelligence can revolutionize weight loss through personalized health optimization. Imagine an AI system that integrates real-time biometric data from wearables with deep learning algorithms. This system would analyze everything: your heart rate, sleep patterns, stress levels, and even blood markers. Based on this data, it would construct a dynamically evolving, tailor-made regimen for diet, exercise, and sleep. But it doesn't stop there. By harnessing natural language processing, this AI could act as a 24/7 personal coach. It could provide real-time feedback during workouts, recommend meals when you're dining out, and even gently nudge you when it detects emotional eating triggers. If you’re in the grocery store, it could guide your choices, pushing you towards nutritious options that align with your current health metrics. The effectiveness here isn't just the personalization, but the adaptability. The AI adjusts its recommendations as it learns more about you, essentially evolving in real-time to your body's responses. It’s all about creating a seamless, intuitive experience that removes the burden of planning, decision-making, and self-monitoring from the individual, making weight loss more achievable than ever. By focusing on this comprehensive, data-driven approach, AI can eliminate much of the guesswork and emotional burden from weight loss, leading to more sustainable and effective outcomes. CGPT-4 submitted by /u/Georgeo57 [link] [comments]
    I built an AI-Editorial Assistant to annotate your work
    submitted by /u/hungryillini [link] [comments]
    Business owner 'hires' ChatGPT for customer service, fires the humans | National Post
    Business owner 'hires' ChatGPT for customer service, then fires the humans Experts divided on whether a new wave of call centre automation will make for better jobs for people, or merely throw millions out of work submitted by /u/AminoOxi [link] [comments]
    AI tool on Fashion Modeling
    Hi, I resell clothing items that has stock images with cropped faces of the model. I need a tool that can help me generate proper model images. I’ve used several tools and it’s doesn’t look realistic then i finally came across a powerful ai tool but it costs 30,000 usd annually so.. Above is an example of what i mean submitted by /u/basheerbgw [link] [comments]
    AI is making browsing Reddit a lot more fun
    submitted by /u/Vinitneo [link] [comments]
    How Will AI Learn Next?
    Stack Overflow was created in 2008 to provide programmers with high-quality technical information. Within three years, it became indispensable to working programmers, with millions of unique visitors each month. Google's OneBox feature, which provides instant answers above search results, led to a decline in traffic for sites like Stack Overflow. Large language models like OpenAI's ChatGPT and Google's Bard aim to ingest the web comprehensively. These models rely on sources like Wikipedia and Reddit for training data. Stack Overflow's new posts have decreased by sixteen percent since the launch of ChatGPT. Source : https://www.newyorker.com/science/annals-of-artificial-intelligence/how-will-ai-learn-next submitted by /u/NuseAI [link] [comments]
    What role can AI play in automating administrative tasks within educational institutions, freeing educators to focus more on teaching and mentoring students?
    Share your insights. submitted by /u/Cygnet-Digital [link] [comments]
    AI Tools for Students: From AI Essay Generators to AI Coding Assistants
    I've noticed more than 1,000 new AI tools hitting the market in the last 30 days! As a student, I'm especially interested in finding AI tools that can help with studying. These aren't just essay generators or note-taking apps. While we all know about ChatGPT and Grammarly, some lesser-known tools are also making a big difference. So, I've compiled a list of the top 10 AI tools focused on educational use—tools that I personally use to improve my efficiency and output. AI tool Category Use for ChatGPT AI Writing This platform allows students to ask queries, request help, or simply chat with the AI in a dynamic and interactive manner. It’s great for brainstorming essay topics and seeking suggestions on how to improve your writing style. But I don’t recommend it as an autonomous AI…
    Interactive Customer Service AI avatar
    Hello everyone! I'm conducting research for a car brand client who is interested in an interactive AI avatar. The idea is to have a screen in a mall where individuals can engage with this avatar and inquire about the latest car model. We plan to train the AI with the car's FAQs to ensure it can address customer queries effectively. The main challenge is ensuring the AI's responses are tailored to the customer's interaction. Here's a perfect example of what we're aiming for (starting at 1:27): https://youtu.be/PqoH9NotmyE?si=zH9kGIaou1x6RoIg&t=86 Does anyone know how this can be acheived? submitted by /u/MrGoodBang [link] [comments]
    One-Minute Daily AI News 10/5/2023
    Traditional benchmarks like the Turing Test are being challenged as outdated. Mustafa Suleyman, a prominent figure in the AI community and co-founder of DeepMind, has proposed a novel approach to gauge the intelligence of AI: its ability to generate wealth.[1] SoftBank CEO Son says artificial general intelligence will come within 10 years.[2] Hugging Face Collaborates with Microsoft to launch Hugging Face Model Catalog on Azure.[3] Artificial intelligence such as ChatGPT to be allowed in Australian schools from 2024.[4] Sources: [1] https://winbuzzer.com/2023/10/02/deepminds-mustafa-suleyman-suggests-new-turing-test-based-on-ai-making-money-xcxwbn/ [2] https://www.reuters.com/technology/softbank-ceo-masayoshi-son-says-artificial-general-intelligence-will-come-within-2023-10-04/ [3] https://huggingface.co/blog/hugging-face-endpoints-on-azure [4] https://amp.theguardian.com/australia-news/2023/oct/06/chatgpt-ai-allowed-australian-schools-2024 submitted by /u/Excellent-Target-847 [link] [comments]
    Using AI to fix audio rip
    Hi! I’m very ignorant of AI so please bear with me. I was wondering if there is any way to use AI to fix a low quality audio rip? Specifically there’s a movie I adore that never had a soundtrack release. Somebody ripped the music from the DVD and removed the audio and sound effects, but the quality is not the best. Is there any way AI could be used to improve this? submitted by /u/Adventurous_Ice5035 [link] [comments]
    Avenues for publishing AI ethics case studies?
    I am a computer science graduate student. As part of my coursework, I am exploring the ethical issues of using Large Language Models for mental healthcare applications. I found four unique examples from the real world and outlined the ethical dilemma within them. I intend to analyze these dilemmas using various ethical frameworks in order to come up with solutions. While I am interested in getting a publication out of this work, I am unsure of the types of conferences/journals that accept case-study articles (specifically in AI ethics). Any advice from academicians over here would be greatly appreciated! submitted by /u/jwalapoet [link] [comments]
    What is a good, free AI voice generator?
    hey! this is probably asked alot, but what is the go-to AI speech generation tool that can be used for free? im making a mission in a mil-sim game called arma 3, and i need some voicelines for radio communications to the player and i dont have enough people who are willing to do voicelines for it so ive taken to AI to hopefully fill this hole. If there are little, or even no good free services, I wouldn't mind if I had to spend a small amount of money for it. thanks in advance o7 submitted by /u/BritishSpuds [link] [comments]
    Banned from subreddit for posting AI generated content
    I got banned today for sharing a music video that was apparently AI-generated. As video and images become more realistic, is there an expectation that this content can actually be filtered? submitted by /u/Unwitting_Observer [link] [comments]
    AGI/Singularity is overhyped.
    Greetings! I would like to begin by stating that I understand why one has much hope in such technologies. The world as we know it is in a drastic shift, and it's hard to think of what it's going to become, and so many cling to hopeful ideas that give promises. AGI/Singularity doesn't have a grounding basis in evidence, or research. It's all theoretics, and the foundation for each technology is quite weak. You see, the mind is a sensorial parsing relational network. All of our sensorial experience is incorporated into a world-model, and thus it begins to rationalize, and be lucid of the environment. I don't think it's possible to re-create this kind of experience with a linear instruction set, let alone neuromorphic computing, or wetware. Each has to be built from the bottom-up with immense precision, and thus far we don't understand the mind. Realistically speaking everything is consciousness, and integrating that idea is the only way forward. tl;dr Replicating cognition is a completely theoretical endeavor, and requires vast amounts of understanding in regards to the nature of reality, not just the quantum, but the unique stochastic behavior of each higher-ordered system. submitted by /u/lucy_chxn [link] [comments]
    AI designs new robot from scratch in seconds
    submitted by /u/liberty4now [link] [comments]
  • Open

    Addition theorems
    Earlier this week I wrote about several ways to generalize trig functions. Since trig functions have addition theorems like a natural question is whether generalized trig functions also have addition theorems. Hyperbolic functions have well-known addition theorems analogous to the addition theorems above. This isn’t too surprising since circular and hyperbolic functions are fundamentally two […] Addition theorems first appeared on John D. Cook.  ( 6 min )
    Hyperbolic tangent sum
    In the previous post I said I was trying remember where I’d seen the tangent sum applied. I mentioned a couple near misses, and it turns out that what I was trying to remember was another near miss. What I’d seen before was not the tangent sum but the hyperbolic tangent sum. Several people suggested […] Hyperbolic tangent sum first appeared on John D. Cook.  ( 5 min )
  • Open

    Personalize your generative AI applications with Amazon SageMaker Feature Store
    In this post, we elucidate the simple yet powerful idea of combining user profiles and item attributes to generate personalized content recommendations using LLMs. As demonstrated throughout the post, these models hold immense potential in generating high-quality, context-aware input text, which leads to enhanced recommendations. To illustrate this, we guide you through the process of integrating a feature store (representing user profiles) with an LLM to generate these personalized recommendations.  ( 13 min )
    Build an image-to-text generative AI application using multimodality models on Amazon SageMaker
    In this post, we provide an overview of popular multimodality models. We also demonstrate how to deploy these pre-trained models on Amazon SageMaker. Furthermore, we discuss the diverse applications of these models, focusing particularly on several real-world scenarios, such as zero-shot tag and attribution generation for ecommerce and automatic prompt generation from images.  ( 13 min )
  • Open

    Keeping an AI on Quakes: Researchers Unveil Deep Learning Model to Improve Forecasts
    A research team is aiming to shake up the status quo for earthquake models. Researchers from the Universities of California at Berkeley and Santa Cruz, and the Technical University of Munich recently released a paper describing a new model that delivers deep learning to earthquake forecasting. Dubbed RECAST, the model can use larger datasets and Read article >  ( 6 min )
  • Open

    Efficient and hardware-friendly neural architecture search with SpaceEvo
    A persistent challenge in deep learning is optimizing neural network models for diverse hardware configurations, balancing performance and low latency. Learn how SpaceEvo automates hardware-aware neural architecture search to fine-tune DNN models for swift execution on diverse devices. The post Efficient and hardware-friendly neural architecture search with SpaceEvo appeared first on Microsoft Research.  ( 10 min )
  • Open

    Sequential Dense Neural Network for binary classification
    Hello. I've developed a simple Neural Recommender System (NRR) with the following architecture: Input layer: 38 neurons Hidden layer: 19 neurons with ReLU activation function Output layer: 1 neuron with a sigmoid activation function The input dataset consists of 39 columns: 38 features and 1 label (with values of 0 or 1). The model is designed to output the probability that a specific input should be classified with label 1. Currently, I am experimenting with hyperparameter tuning, adjusting the learning rate, epoch, and batch size. However, I've observed an issue where, with certain combinations of hyperparameters, the maximum probability outputted by the model is not 1, but rather 0.25, for example. How is this possible? Thanks submitted by /u/nllnp [link] [comments]

  • Open

    [D] - Synthetic dataset - Searching for honest comparison between LLM (gpt4, bizon, jurassic-2, Claude...)
    I'm looking for resources, papers, or experiences that compare the performance of large language models (LLMs). I'm trying to find a honest benchmark to compare the capabilities of the latest large models, while really intrested un those: GPT-3.5 Instruct, GPT-4, Claude 2, Claude Instant 100k, Palm2-Bizon, jurassic-2, LLama2 70 and other state-of-the-art LLama2 fine tunes (possibly an Orca-style model). I'm interested in general benchmarks and, if they exist, comparisons of performance on synthetic data generation tasks (both generating data with the "textbook are all you need" approach used in Phi and some Orca/EvolveInstuct-style models like Wizard...). submitted by /u/Distinct-Target7503 [link] [comments]
    [P] How to extract and count artist mentions from messy text data using LLMs
    I have a long list of responses from a poll (in this case, we've asked our Facebook community we should have at our music festival). Our goal is to count the total mentions for each artist, but the data quality is low. Here is some sample data: Rena Guinn and the Gentlemen Blackwater Railroad Company Mo' Mojo Music !! We would love to be apart of this awesome event! Amazing!!!!! The Rollin' Rust came threw at the #falldownfest last weekend 🙂 much love:) keep it up boys 🙂 Luke Hess Langhorne Slim!!!!!, Sierra Hull, First Aid Kit, Jim Lauderdale (always) We feel the data quality is too poor for basic LDA approaches (lots of misspellings, odd phrasings) and we feel a LLM would be best at least extracting the names of artists using context. We have found that ChatGPT and Claude are decent at the extraction tasks on small samples but can't handle the full input, and are next to worthless on the counting task. We've tried very specific and differnet prompts, but haven't been able to get a good result. So how should I approach this problem? I'm not sure how to break this down in to prompts or substeps. I'm not sure how to do anything of this outside of a browser, and I'm a data science novice, but willing to learn some things. Here's an example of a prompt that's not returning correct counts (off by >50% in most cases) The following is raw text comments copied from a poll. Count the total number of mentions in the poll and create a table that contains columns Band (a unique list of bands) and a column containing the total number of mentions. The table should cover the top 100 bands by total mentions. Use judgement and context to conform band names in to unique values (Example: The Town Pants, Town Pants, townpants are all the same band). Count completely and accurately. Now here is the raw data: submitted by /u/strway2heaven77 [link] [comments]  ( 10 min )
    [P] Avenues for publishing AI ethics case studies?
    I am a computer science graduate student. As part of my coursework, I am exploring the ethical issues of using Large Language Models for mental healthcare applications. I found four unique examples from the real world and outlined the ethical dilemma within them. I intend to analyze these dilemmas using various ethical frameworks in order to come up with solutions. While I am interested in getting a publication out of this work, I am unsure of the types of conferences/journals that accept case-study articles (specifically in AI ethics). Any advice from academicians over here would be greatly appreciated! submitted by /u/jwalapoet [link] [comments]  ( 9 min )
    [D] [R] Is the noise predictor in DDPMs predicting the noise added to x_0 or the noise added to x_{t-1}?
    Hi fellow computer scientists, ​ After reading the paper Improved Denoising Diffusion Probabilistic Models I got a little confused. Looking at section "2.2. Training in Practice" the authors say that: 1) "The network could also predict the noise eps added to x_0, and this noise could be used to predict x0 via..." ​ 2) "Ho et al. (2020) found that predicting eps worked best..." ​ So this left me wondering if the noise predictor is trying to compute (1) the epsilon that was added to x_0 through the close-form formula or (2) the noise added in the previous timestep to obtain x_t from x_{t-1} (i.e., eps_t or eps_{t-1}, idk...)? ​ Thank you :) submitted by /u/Christs_Elite [link] [comments]  ( 9 min )
    [P] MazeGPT - Transformer based maze generator
    Hello all, I recently did a summer research project implementing GPT-2 to generate mazes. The core concept of the model is to combine a bunch of popular maze generation algorithms into one. The goal was that the transformer will be able to identify key components using self-attention and piece together different algorithms. Most maze generation algorithms result in almost a finger print (like in chaos theory). The end goal was to mimic a higher degree of randomness / make the mazes appear less algorithmic. I'm dipping my toes into the realm of research and am looking for feedback. So far I've run the model for 5x5 mazes, it would be interesting to try training the model with varying dimensions. Feel free to join in and contribute to the project! https://github.com/noah-hein/mazeGPT 5x5 live generation https://i.redd.it/v6smbdd88gsb1.gif ​ submitted by /u/noah-hein [link] [comments]  ( 9 min )
    [D] Unable to improve binary classification problem accuracy
    I am currently working on a binary classification problem where I aim to predict whether a customer will make a purchase in the next 30 days based on their transaction history. I have a dataset of 1,000 transactions with the following features: TransactionAmount (float): The amount of the transaction. ProductCategory (categorical): Category of the product purchased (e.g., Groceries, Electronics, Books). DateOfPurchase (datetime): The date on which the transaction occurred. I've done some preprocessing and feature engineering, including normalization, one-hot encoding of categorical variables, creating interaction terms, and adding features like days since the first purchase and whether the purchase was made during the holiday season.Dataset is balanced and cleaned. I started with a base Random Forest classifier with default parameters as a starting point, but the performance is not satisfactory (accuracy = 48.5%, ROC-AUC = 0.485). I tried other models as well but was unable to improve the accuracy by more than 57%. submitted by /u/SnooTigers4634 [link] [comments]
    [D] EMNLP 2023 results
    Making a post for EMNLP 2023 results to come out today. submitted by /u/East-Beginning9987 [link] [comments]  ( 8 min )
    [P] Need help figuring out my input for anomaly detection in frequency responses
    I’ve been given a task to identify if a PCB is faulty or not based on its frequency response. I don’t have labeled data. The data I have are various gain values calculated over frequencies, so my data looks something similar to the table below. PCB | Frequency | G1 | G2 PCB 1 | 1Hz | 0.1 | 1 PCB 1 | 2Hz | 0.2 | 2 PCB2 | 1Hz | 0.3 | 3 PCB2 | 2Hz | 0.4| 4 Each PCB has several G parameters measurements taken over the same set of frequencies. I need to use an auto encoder to identify outliers and I need help in deciding how my feature matrix should look like. For example, let us consider only one data point that is PCB 1, then would a matrix like this make sense? [[ 0.1 0.2 ] - 1st row is all G1 values [1 2]] - 2nd row is all G2 values Similarly the matrix for the other PCBs are also created. I have not included frequency in my feature set because these G parameters have been measured for the same set of frequencies for all PCBs. Is this correct ? Additionally, are there any resources someone can point me to related to finding anomalies in frequency response data ? I am struggling with using the keywords while googling. submitted by /u/Savage_Garbage [link] [comments]
    [R] Towards Monosemanticity: Decomposing Language Models With Dictionary Learning. From Anthropic. "We demonstrate a method for decomposing groups of neurons into interpretable features [...]".
    Paper. I am not affiliated with this paper or its authors. Twitter thread (Nitter alternative for those who want to see the entire thread without being logged into Twitter). Related work: Sparse Autoencoders Find Highly Interpretable Features in Language Models. submitted by /u/Wiskkey [link] [comments]  ( 9 min )
    [R] Meta researchers present method for decoding speech from brain waves
    Researchers at Meta trained a deep learning model on brain recordings and audio data from 169 people listening to speech. Their method achieves up to 73% accuracy at identifying a 3-second clip of speech from non-invasive EEG or MEG scans. This is a massive improvement over previous attempts at decoding speech from neural signals. It approaches the performance of studies using implanted electrodes. The key innovations: A contrastive loss function that aligns latent speech and brain representations Leveraging pretrained speech models like wav2vec 2.0 Training one model on multiple subjects with individual tuning Being able to decode speech intention from brainwaves could one day help restore communication for patients suffering from strokes, ALS, etc. There's still a ways to go before this becomes a medical reality. Performance needs to improve and be validated during speech production rather than just passive listening. And the accuracy isn't high enough for natural conversations. But this is a hugely promising step toward brain-computer interfaces. Really interesting work at the intersection of neuroscience and AI! TLDR: New model achieves up to 73% accuracy decoding speech directly from non-invasive brain scans. Could eventually help patients with neurological conditions communicate just by thinking. Full summary here. Paper is here submitted by /u/Successful-Western27 [link] [comments]  ( 9 min )
    [D] EMNLP 2023 Notification
    Discussion thread for EMNLP 2023 notifications which will be released in a few hours along with GEM workshop. Best of luck to everyone. submitted by /u/EDEN1998 [link] [comments]  ( 9 min )
    [D] ordinal or nominal variable?
    Hey all, I am working with stock market data and scratching my head if certain variables are ordinal and can be left as is or if it is nominal and should be one-hot encoded. One of the variables in question consists of the direction of the market over a certain time. It has three categories: up, down, sideways. hope was to code them as 1, -1 and 0 respectively and treat as ordinal. There appears to be some order/relationship between them but not sure if it is enough. Is this the correct approach or should it be one-hot encoded? submitted by /u/Fishpo0 [link] [comments]  ( 9 min )
    [D] Deep Learning online course using PyTorch
    I've been out of the deep learning space for a while now and I'd like to take an online course, or set of courses, to get myself back up to speed on the latest techniques, architectures, and how to use them. I think the DeepLearning.ai specialization through Coursera is a good match, but I see that it uses Tensorflow. Is there any course like this that would use PyTorch? Or would the transition not be too hard once the fundamentals are in place? Thanks! submitted by /u/ComicFoil [link] [comments]
    Fine Tuning or RAG for Coding [D]
    Need some help what is the best way to start. Pls Advice ! I have a specific code in my repos (lets say .net + JS). The goal is to have prompt based code adjustments to existing repos (like very focused copilot) . Either using single agent or using something like AutoGen. So let say I have thousands of files with code and some descriptions about code functionality (spec) . I want either to generate code based on next spec and I want newly generated code to be similar in style to what is in my repos. So now questions: Should I vectorize my code (What is best way to do that ?) or try to fine tune some model ? Give me your ideas / experience in code generation based on previous code. submitted by /u/mcwin1 [link] [comments]  ( 9 min )
    [Project] I built an open-source scraping API that returns structured JSON data using GPT.
    I decided to open-source my own web scraping API that I'm using to get information from different websites without using any selectors or XPath. Just provide the URL and a desired JSON schema, and it will return extracted data. Hope this can be helpful for someone. Cheers! https://github.com/semanser/JsonGenius https://preview.redd.it/icq1i8slvesb1.png?width=4096&format=png&auto=webp&s=ac86ccdb3da5ef1ffa86e3473619162f6b652ac6 submitted by /u/semanser [link] [comments]  ( 9 min )
    [R] Is self-correction a viable method to improve LLM reasoning? Probably not.
    Can LLMs actually improve their own reasoning by self-correcting mistakes? A new paper from DeepMind and the University of Illinois looks to answer this quantitatively. The results show that unaided, LLMs struggle at self-correction for reasoning tasks. The core issue is LLMs have trouble reliably evaluating the correctness of their own responses. They rarely identify flaws in initial reasoning. Sometimes LLMs even alter initially correct responses to become incorrect after self-correction! (I've personally seen this when interacting with ChatGPT many times and you probably have too). More complex techniques like critiquing between LLM instances don't help much either. External feedback or guidance looks necessary to improve reasoning (Well, some interesting parallels to this paper here about implicit improvement from preference data vs traditional RLHF). Self-correction does show promise for things like making responses more polite or safe though. Criteria there are more clear-cut. The authors argue we need to balance enthusiasm with realistic expectations on self-correction. It has a lot of limits for improving reasoning (at least with current models). But they suggest promising directions like incorporating high-quality external feedback from humans, training data, and tools. That could be key to unlocking self-correction's potential down the road. TLDR: Basically title... LLMs can't reliably self-correct reasoning yet. Maybe hybrid techniques combining self-correction with external guidance could work but we need more research. Full summary. Paper is here. submitted by /u/Successful-Western27 [link] [comments]  ( 9 min )
    [P] NIDDK-CR Data Centric Challenge: Enhancing NIDDK datasets for future artificial intelligence applications
    Calling all AI researchers! Using data aggregation, harmonization, fusion, and other data enhancement methods, you can help the National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK) enhance the utility of NIDDK datasets for AI applications. The goal of the NIDDK Data Centric Challenge will be to generate an “AI-ready” dataset that can be used for future data challenges, using data on Type 1 Diabetes available through the NIDDK Central Repository. Register today! https://www.challenge.gov/?challenge=niddk-central-repository-data-centric-challenge submitted by /u/DataCentricChallenge [link] [comments]
    [D] off-topic, is Meta Llama 2 license agreement safe to sign for commercial use ?
    in the Meta Llama 2 license agreement (that can be found here), there is a section of "Prohibited Uses" that clearly states several use cases that the signer must accept upon himself, but several of them state the word "facilitate", as far as i can understand, if we use Llama 2 as part of a commercial product, and some end-user will use the product in malicious way (say cause the chat-bot to write the recipe of mustard gas) then this could be considered that the creator of the product is facilitating the end-user, ​ so my questions are: do you think this is a fair interpretation of the agreement ? does that mean the creator is liable to whatever the model spit out ? is there a way to censor the model (short of retraining a new model, or fine-tune on a large scale) ? is there an open source model that already gone through the process, and more safe for commercial use ? ​ https://preview.redd.it/3zo3tm4e8esb1.png?width=1197&format=png&auto=webp&s=8aa522183f82ba8f85edb69cbaabd93262efd516 ​ as per @gentlecucumber advice, i also posted it on r/legaladvice: https://www.reddit.com/r/legaladvice/comments/170ll2t/d_is_meta_llama_2_license_agreement_safe_to_sign/?utm_source=share&utm_medium=web2x&context=3 submitted by /u/Particular_Flower_12 [link] [comments]
    [D] TesseractOCR vs PaddleOCR vs EasyOCR for Japanese text extraction
    Which would be the best OCR toolkit to invest the effort to learning and building a pipeline for an OCR system that will be used to extract Japanese text? I tried Tesseract initially and although I got some good results, I found it hard to do finetuning due to messy and outdated documentation. I haven't had the time to look at the other two OCR tools yet but if anyone had any experience, please do share them especially with how easy or difficult is the finetuning process as well as deploying the tuned models. submitted by /u/Spitfire_ex [link] [comments]  ( 9 min )
    [D] Adapting OpenSource GPT Models - requirements/possibilities?
    Hi, our company plans for some budget in 2024 to invest into hardware to do the following - running local LLMs for our coworkers to interfere with an locally running offline GPT alike ChatGPT. Use cases: generating templates for email, letters etc Translation (EN/GER/FR/SPA) Querying internal knowledge bases and/or FAQs/HOWTOs I did some research but it is still hard for me to estimate what are the HW / AI skill requirements to implement something not a quarter as good as ChatGPT. Ive played with Nomics gpt4all which comes close to a baseline. We cant use cloud services due to our data privacy policy, so I checked on what would be a good starting point to invest into hardware. I came up with a gamer PC (octacore Intel i9/AMD Ryzen 7) utilizing NVidia RTX 4090 (24Gb) / Radeon RX 7900 / 2TB SSD / 64Gb RAM for approximately 3600 Eur. I am pretty sure that would be sufficient to host a decent LLM serving simultaneous client requests. But is there also a way to adapt / process our companies data? Most sources state that proper LLMs were trained using hundreds of NVidia A100 and thousands of CPUs. On the other hand we would be fine with just fine-tuning a pretrained model. Could you please point me to some sources to learn more about possibilities and requirements as to be able to make well-informed investment decisions? Also, we probably lack the required skills, and would be interested to learn if there are companies and/or projects assisting with this kind of task? thanks submitted by /u/EatTFM [link] [comments]  ( 9 min )
    [D] - Are LoRAs able to improve results on reasoning benchmarks or is full-parameter fine tuning required?
    Is there any good research on which benchmarks LoRAs are most effective at impacting, or are they relegated mostly to changing the style of an LLM's response? submitted by /u/30299578815310 [link] [comments]  ( 9 min )
    [D] How to test if regression model is statistically significantly better, including its test error?
    I have a regression model, predicting a popularity of a text. I have its performance metrics on test set, e.g. RMSE and MAE. This gives me an uncertainty estimate about its predictions. Now I want to transform the text in some way, e.g. give it to human experts or another model to "upgrade" (in terms of getting better popularity). So I have the original and transformed text. Now I have 3 popularity scores: true popularity for original text predicted popularity for original text predicted popularity for transformed text Obviously, if model MAE is for example around 5, and predicted popularity for transformed text is higher than for the original by 1.5, this can be totally random, due to errors in the model prediction. How can I measure if text transformation is beneficial, i.e. statistically significantly better than the original text, incorporating information about model quality? Requiring that the improvement has to be higher than model error would be incredibly strict. submitted by /u/qalis [link] [comments]
    [D] David Donoho: Data Science at the Singularity (pushback on AGI singularity, advocates for Open Science and reproducibility)
    submitted by /u/wojcech [link] [comments]  ( 9 min )
    [R] Tensor Programs VI: Feature Learning in Infinite-Depth Neural Networks
    Paper: https://arxiv.org/abs/2310.02244 Abstract: By classifying infinite-width neural networks and identifying the optimal limit, Tensor Programs IV and V demonstrated a universal way, called μP, for widthwise hyperparameter transfer, i.e., predicting optimal hyperparameters of wide neural networks from narrow ones. Here we investigate the analogous classification for depthwise parametrizations of deep residual networks (resnets). We classify depthwise parametrizations of block multiplier and learning rate by their infinite-width-then-depth limits. In resnets where each block has only one layer, we identify a unique optimal parametrization, called Depth-μP that extends μP and show empirically it admits depthwise hyperparameter transfer. We identify feature diversity as a crucial factor in deep networks, and Depth-μP can be characterized as maximizing both feature learning and feature diversity. Exploiting this, we find that absolute value, among all homogeneous nonlinearities, maximizes feature diversity and indeed empirically leads to significantly better performance. However, if each block is deeper (such as modern transformers), then we find fundamental limitations in all possible infinite-depth limits of such parametrizations, which we illustrate both theoretically and empirically on simple networks as well as Megatron transformer trained on Common Crawl. Interesting, great to see this line of work continued, muP was great, now Depth-muP submitted by /u/_puhsu [link] [comments]  ( 9 min )
  • Open

    Generative AI megatrends: Gen AI start-up ecosystem
    One of my students asked me: “Which is the best area/s for Gen AI start-ups?” This is not an easy question – mainly due to the dynamic nature of AI, but here are two reference points. The first is a Generative AI Tools Landscape from DataCamp. This gives both the categories and the subcategories for… Read More »Generative AI megatrends: Gen AI start-up ecosystem The post Generative AI megatrends: Gen AI start-up ecosystem appeared first on Data Science Central.  ( 19 min )
  • Open

    AI: Voice cloning tech emerges in Sudan civil war
    A campaign using AI voice cloning technology to impersonate Omar al-Bashir, the former leader of Sudan, has gained attention on TikTok. The anonymous account has been posting what it claims are 'leaked recordings' of the ex-president, despite Bashir not being seen in public for a year and being believed to be seriously ill. Experts warn that campaigns like this demonstrate how new tools can quickly and cheaply distribute fake content through social media. The authenticity of the recordings has been questioned, but evidence suggests that voice conversion software has been used to mimic Bashir's voice. TikTok has taken down the account, stating that it violated their guidelines on posting false content and the use of synthetic media. Source : https://www.bbc.co.uk/news/world-africa-66987869 submitted by /u/NuseAI [link] [comments]
    When AI tells you what you want to hear, even if it knows it's not true ... A Bard example
    I love Bard. It eloquently tells me things in a way that meets and exceeds my expectations, and even more than GPT-4. But what is Google's strategy behind programming it to say things that it knows are not true? Do they train it to say what the user wants to hear? If so, that's disconcerting to say the least, although could be a brilliant way to comb users for enhancement requests. Here is my most recent experience, a humorous example which I feel illustrates the point perfectly: Me: Bard, can you take a .wav audio file and create a phoneme transcript using the IPA symbol library? Bard: Yes, I can work from a .wav audio file to transcribe your speech into IPA phonemes. You can upload the file to a cloud storage service such as Google Drive. Me: Great. Here's the link to the file: <link…
    Just created /r/thisiswhyaiwasmade for the more silly side of AI generated content
    Join us! /r/thisiswhyaiwasmade is where we have fun and celebrate the enjoyable and entertaining side of the tech. Need a place to put your silly and playful prompts, AI generated content, ideas for ways the bend the rules of AI? This is the place! submitted by /u/unknowingafford [link] [comments]
    CGPT-4, aside from solving climate change, what is the most positive thing that AI can do for the world?
    In the context of maximizing positive impact, AI has transformative potential in healthcare. Let's start with personalized medicine. AI can analyze vast data sets, consisting of genetic, environmental, and lifestyle variables, to tailor medical treatments for individuals. This level of personalization could substantially increase treatment effectiveness and reduce adverse side effects. We're not just talking about better pills; we're envisioning targeted gene therapies and custom-tailored interventions that could be more curative rather than merely palliative. Beyond personalization, AI can democratize healthcare access. Remote areas without specialists could have AI systems that diagnose and recommend treatments, making quality healthcare accessible without the need for extensive travel. Telemedicine could be supercharged by AI, offering medical consultations and second opinions to anyone with internet access. Moreover, AI could lead breakthroughs in drug discovery and epidemiology. It could analyze complex biochemical interactions at a speed unattainable by humans, slashing the time and cost to bring new drugs to market. On a larger scale, AI could monitor and model the spread of diseases, providing actionable insights for containment and treatment strategies. So, AI in healthcare is not merely an incremental improvement. It's a paradigm shift that could equalize healthcare access and significantly extend human life while improving its quality. All these advancements could happen within our lifetime, changing the face of medicine and, by extension, society. submitted by /u/Georgeo57 [link] [comments]
    What's the difference between a human's brain and AI?
    Functioning. Humans use the brain's computing power, memory, and ability to think, whereas AI-powered machines rely on data and specific instructions fed into the system. Besides, it takes a very long time for humans to process and understand the problems and gets accustomed to them. submitted by /u/Virtual-Study-Campus [link] [comments]
    6 AI Apocalypse Scenarios And Why They're Wrong
    submitted by /u/arrowoftime [link] [comments]
    How to use custom instructions for ChatGPT like a Pro (Ultimate Guide for 2023)
    submitted by /u/Senior_tasteey [link] [comments]
    DeepMind cofounder is tired of ‘knee-jerk bad takes’ about AI
    Mustafa Suleyman, the cofounder of DeepMind and CEO of Inflection AI, discusses his concerns about AI risks and the need for precaution. He believes that while some extreme scenarios may be over the top, it's important to treat powerful technologies with caution. Suleyman highlights the middle layer of AI risks that people often underestimate, which involves the amplification of goals for both good and bad actors. He emphasizes the need to contain AI to prevent potential negative consequences. Suleyman talks about the balance between risks and opportunities in technology and the importance of considering both aspects. He mentions the hype around generative AI and the need to look beyond the surface to understand its true potential. Suleyman discusses the discussions with lawmakers about AI and the challenge of bridging the gap between policy makers and tech experts. Source : https://venturebeat.com/ai/deepmind-cofounder-is-tired-of-knee-jerk-bad-takes-about-ai/ submitted by /u/NuseAI [link] [comments]
    Does Sam Altman Know What He’s Creating?
    submitted by /u/norcalnatv [link] [comments]
    DeepMind, Univ. of Illinois: Is self-correction a viable method to improve LLM reasoning? Probably not.
    Can LLMs actually improve their own reasoning by self-correcting mistakes? A new paper from DeepMind and the University of Illinois looks to answer this quantitatively. The results show that unaided, LLMs struggle at self-correction for reasoning tasks. The core issue is LLMs have trouble reliably evaluating the correctness of their own responses. They rarely identify flaws in initial reasoning. Sometimes LLMs even alter initially correct responses to become incorrect after self-correction! (I've personally seen this when interacting with ChatGPT many times and you probably have too). More complex techniques like critiquing between LLM instances don't help much either. External feedback or guidance looks necessary to improve reasoning (Well, some interesting parallels to this paper here about implicit improvement from preference data vs traditional RLHF). Self-correction does show promise for things like making responses more polite or safe though. Criteria there are more clear-cut. The authors argue we need to balance enthusiasm with realistic expectations on self-correction. It has a lot of limits for improving reasoning (at least with current models). But they suggest promising directions like incorporating high-quality external feedback from humans, training data, and tools. That could be key to unlocking self-correction's potential down the road. TLDR: Basically title... LLMs can't reliably self-correct reasoning yet. Maybe hybrid techniques combining self-correction with external guidance could work but we need more research. Full summary. Paper is here. submitted by /u/Successful-Western27 [link] [comments]
    I need help finding a tool
    Buddy no of the tool where I can take an image have an AI translated and replace the text with the same style and have it in the new language like for example translating a Japanese image to English and have it look exactly the same just in English I'm looking for a free one that doesn't require credits it can be a desktop app or a website doesn't matter just needs to be free submitted by /u/agentduckman12 [link] [comments]
    How much do I have to edit AI generated images to become my own IP?
    Hey there! I'm a 1-man card game designer and while juggling the project as well as mt senior year of college, I have been relying heavily on AI-generated artwork to speed up my workflow with some illustrations and other forms of world-building. In regards to the recent legal decisions (in the US), in which any work produced by AI cannot be copyrighted, how much do I need to change the illustrations to become my own, if I even can at all? Thanks! Edit for clarity: I am also an illustrator. So this question comes from the perspective of an artist trying to save time and energy for other projects submitted by /u/Luke192 [link] [comments]
    Comparative Evaluation of Fine-Tuned and Standard Language Models in Emulating Living Historical Figures: A Detailed Study Proposal
    submitted by /u/alcanthro [link] [comments]
    JPMorgan CEO Jamie Dimon: AI will lead to 3.5-day workweek | Fortune
    Jamie Dimon says the next generation of employees will work 3.5 days a week and live to 100 years old submitted by /u/AminoOxi [link] [comments]
    Google unveils Pixel 8 built for 'the generative AI era' | CNN Business
    submitted by /u/pehnsus [link] [comments]
  • Open

    Improve prediction quality in custom classification models with Amazon Comprehend
    In this post, we explain how to build and optimize a custom classification model using Amazon Comprehend. We demonstrate this using an Amazon Comprehend custom classification to build a multi-label custom classification model, and provide guidelines on how to prepare the training dataset and tune the model to meet performance metrics such as accuracy, precision, recall, and F1 score.  ( 8 min )
    Fast and cost-effective LLaMA 2 fine-tuning with AWS Trainium
    Large language models (LLMs) have captured the imagination and attention of developers, scientists, technologists, entrepreneurs, and executives across several industries. These models can be used for question answering, summarization, translation, and more in applications such as conversational agents for customer support, content creation for marketing, and coding assistants. Recently, Meta released Llama 2 for both […]  ( 7 min )
  • Open

    New tools are available to help reduce the energy that AI models devour
    Amid the race to make AI bigger and better, Lincoln Laboratory is developing ways to reduce power, train efficiently, and make energy use transparent.  ( 11 min )
  • Open

    OpenAI's justification for why training data is fair use, not infringement [pdf]
    submitted by /u/nickb [link] [comments]
    Traveling Words: A Geometric Interpretation of Transformers
    submitted by /u/nickb [link] [comments]
  • Open

    Tangent sum
    When I was writing my post on lemniscate functions yesterday, a line from the Wikipedia article seemed familiar for reasons I cannot place. Defining a tangent-sum operator as a ⊕ b := tan(arctan ⁡ a + arctan ⁡ b) gives cl² z ⊕ sl² z = 1. I feel like I’ve seen this tangent-sum used before, but […] Tangent sum first appeared on John D. Cook.  ( 6 min )
    Enriched categories
    We begin with a couple examples. First, the set of linear transformations from one vector space to another is itself a vector space. Second, the set of continuous linear operators from one Banach space to another is itself a Banach space. Or maybe better, this set can be made into a Banach space. In the […] Enriched categories first appeared on John D. Cook.  ( 6 min )
    p-norm trig functions and “squigonometry”
    This is the fourth post in a series on generalizations of sine and cosine. The first post looked at defining sine as the inverse of the inverse sine. The reason for this unusual approach is that the inverse sine is given in terms of an arc length and an integral. We can generalize sine by […] p-norm trig functions and “squigonometry” first appeared on John D. Cook.  ( 5 min )
    Geometric derivation of hyperbolic trig functions
    This is the third post in a series on generalizing sine and cosine. The previous post looked at a generalization of the sine and cosine functions that come from replacing a circle with a lemniscate, a curve that looks like a figure eight. This post looks at replacing the circle with a hyperbola. On the […] Geometric derivation of hyperbolic trig functions first appeared on John D. Cook.  ( 5 min )
  • Open

    HoloAssist: A multimodal dataset for next-gen AI copilots for the physical world
    HoloAssist is a new multimodal dataset consisting of 166 hours of interactive task executions with 222 participants. Discover how it offers invaluable data to advance the capabilities of next-gen AI copilots for real-world tasks. The post HoloAssist: A multimodal dataset for next-gen AI copilots for the physical world appeared first on Microsoft Research.  ( 10 min )
    Intern Insights: Dr. Madeleine Daepp with Jennifer Scurrell and Alejandro Cuevas
    Connecting with researchers, collaborating across disciplines, and exploring a new city—PhD students Jennifer Scurrell and Alejandro Cuevas talk to Senior Researcher Madeleine Daepp about the internship experience at Microsoft Research. The post Intern Insights: Dr. Madeleine Daepp with Jennifer Scurrell and Alejandro Cuevas appeared first on Microsoft Research.  ( 29 min )
  • Open

    Brains of the Operation: Atlas Meditech Maps Future of Surgery With AI, Digital Twins
    Just as athletes train for a game or actors rehearse for a performance, surgeons prepare ahead of an operation. Now, Atlas Meditech is letting brain surgeons experience a new level of realism in their pre-surgery preparation with AI and physically accurate simulations. Atlas Meditech, a brain-surgery intelligence platform, is adopting tools — including the MONAI Read article >  ( 7 min )
    Fall in Line for October With Nearly 60 New Games, Including Latest Game Pass Titles to Join the Cloud
    October brings more than falling leaves and pumpkin spice lattes for GeForce NOW members. Get ready for nearly 60 new games to stream, including Forza Motorsport and 16 more PC Game Pass titles. Assassin’s Creed Mirage leads 29 new games to hit the GeForce NOW library this week. In addition, catch a challenge to earn Read article >  ( 9 min )

  • Open

    Ring Attention with Blockwise Transformers for Near-Infinite Context
    submitted by /u/nickb [link] [comments]
    Think before you speak: Training Language Models With Pause Tokens
    submitted by /u/nickb [link] [comments]
    Towards Self-Assembling Artificial Neural Networks through Neural Developmental Programs
    submitted by /u/nickb [link] [comments]
    AI has been reading my mind.
    I know several people that tell me whenever they say something out loud, they start seeing it advertised to them or on their feed. But for me, if I think of certain things, even if I never said it out loud, it will appear on my feed.. has anything similar been happening to anyone else? submitted by /u/GuaranteedBigBoy [link] [comments]
  • Open

    [P] Open-source project to run locally LLMs in browser, such as Phi-1.5 for fully private inference
    Excited to introduce BlindChat (https://github.com/mithril-security/blind_chat), an open-source, privacy-centric alternative to ChatGPT for in-browser Conversational AI! We provide full local inference in browser, by using libraries from Hugging Face like transformers.js or candle for WASM inference. We have supported several small models, the latest one being Phi-1.5, the 1.3B model that beat Llama 2 7b! As Microsoft’s researchers mentioned in their paper, the model often produces incorrect code and statements. They are just suggestions, and this model is not trained for instruction tuning, so it might be harder to use than regular chat. More info on their model card (https://huggingface.co/microsoft/phi-1_5). We would love to have your feedback on our project, as we are aiming to build a privacy-first and open-source alternative to ChatGPT! submitted by /u/Separate-Still3770 [link] [comments]  ( 9 min )
    [D] What is the relation between learning rate and vanishing gradient problem?
    How can we tackle vanishing gradient problem by changing the learning rate? Is it possible? submitted by /u/InternationalBack472 [link] [comments]  ( 9 min )
    [P] Torchsummary not working with your layers again? Try this lightweight alternative
    pip install output-shape It is a minimalistic and simple alternative to torchsummary with a simple print of the output shape of a layer, or custom layer. For torch.nn.MultiheadAttention, it handles both the output shape and the attn matrix separately. https://github.com/avocardio/output-shape Currently only works with PyTorch models, soon with Tensorflow / Keras as well. Jax is also on the list for later! submitted by /u/capital-man [link] [comments]  ( 9 min )
    [D] Thoughts on current Vector DB landscape?
    Hello, What are your thoughts on current Vector DB offerings? For instance: Do you think the pricing for them is reasonable/viable? Do you think there’s a sufficient level of developer/user experience? What about for those who aren’t necessarily specialized in data? If you like a managed service, why do you prefer it over the open source alternatives? submitted by /u/LucasSaysHello [link] [comments]  ( 9 min )
    [R] NeuRBF: A Neural Fields Representation with Adaptive Radial Basis Functions
    Project Page Paper Code We present a novel type of neural fields that uses general radial bases for signal representation. State-of-the-art neural fields typically rely on grid-based representations for storing local neural features and N-dimensional linear kernels for interpolating features at continuous query points. The spatial positions of their neural features are fixed on grid nodes and cannot well adapt to target signals. Our method instead builds upon general radial bases with flexible kernel position and shape, which have higher spatial adaptivity and can more closely fit target signals. To further improve the channel-wise capacity of radial basis functions, we propose to compose them with multi-frequency sinusoid functions. This technique extends a radial basis to multiple Fourier radial bases of different frequency bands without requiring extra parameters, facilitating the representation of details. Moreover, by marrying adaptive radial bases with grid-based ones, our hybrid combination inherits both adaptivity and interpolation smoothness. We carefully designed weighting schemes to let radial bases adapt to different types of signals effectively. Our experiments on 2D image and 3D signed distance field representation demonstrate the higher accuracy and compactness of our method than prior arts. When applied to neural radiance field reconstruction, our method achieves state-of-the-art rendering quality, with small model size and comparable training speed. submitted by /u/Sirisian [link] [comments]  ( 9 min )
    [D]
    Hi guys ! I am going to purchase a laptop for programming and AI tasks. I will be working on a simulation software project related to the trajectory of an object in 2d and 3d space. Which laptop will be the most suitable for these tasks and it should have high battery backup because the place where I work does not have enough power sockets. The first laptop which came into my mind was Macbook pro with M2 pro chip and Lenovo Thinkpad X1 Carbon gen 10. Suggest me the best. submitted by /u/smitherium [link] [comments]  ( 9 min )
    [Discussion] Feature Selection Algorithms
    I have only 200 samples but about 30 features. What are some effective commonly used feature selection algorithms? I want to identify the features that play the most significant role in generating outcomes. submitted by /u/Shina-pig [link] [comments]  ( 9 min )
    [R] Will a small error be determining in the final decision for my paper?
    About a week ago, I submitted my first paper into one of the most prestigious Machine Learning conferences out there. This was a last minute submission, and my supervisor and I were working on it simultaneously until the very last moment. Sadly, my supervisor committed an error when writing the mathematical definition of a certain set, slightly changing its meaning. This change, even though small, changes the definition in such a way that the subsequent theorem and its proof isn't formally correct anymore, as it assumes the original definition of the set, not the new one. How much will this affect the decision of accepting or rejecting my paper? The whole method, results and consequences are still the same, no matter this definition. It's more a problem of a "formal" nature (here "formal" as a word in the mathematical sense). Is there a other way that I can inform about this error without changing the content maybe? I know that at some point, they give a chance to edit the original paper, but I don't know if this is after the decision to accept/reject. submitted by /u/howtorewriteaname [link] [comments]  ( 9 min )
    How can I apply object detection and image segmentation functionality to my current custom-trained Image Classification model? [D]
    So, a few months ago, I started developing this deep learning model, which was made purely to differentiate whether the input image is driftwood floating in water or a crocodile. To my knowledge, I leveraged the resnet50 pre-trained SoTA model to train my deep learning model, and for that, I downloaded almost 5k images of driftwood and crocodiles for my model training. Once the training was complete, I took the next step and deployed my model on the Hugging Face Spaces app, allowing my friends to put it to the test. But here's where I ran into a significant challenge: users could even upload their own selfies, and my model would attempt to predict whether they were a crocodile or a piece of driftwood! So how can I leverage object detection or the image segmentation pipeline so that when the user inputs their image, it tries to detect the object from the photo and then detect whether the detected object from the given image contains a crocodile or not? If the crocodile or driftwood is not found then it should return "No object found" or like that. submitted by /u/meWhoObserves [link] [comments]  ( 9 min )
    [R] Large Language Models Represents Space and Time
    Paper - https://arxiv.org/abs/2310.02207 submitted by /u/MysteryInc152 [link] [comments]  ( 8 min )
    [R] Help Shape the Future of Machine Learning: Take Our Short Survey and Let's Create Something Amazing Together!
    Hello Redditors in r/MachineLearning We are the team behind ML Workbench, an upcoming integrated platform designed to streamline your entire machine learning lifecycle. From data preprocessing and model training to validation and deployment, we aim to make the process as seamless as possible. But here's the thing: we need your insights to build something that truly resonates with the community and solves real-world problems. 📝 Click Here to Take the Survey Why Should You Care? Unified Experience: Imagine managing all your ML tasks in one integrated environment. High-Performance Computing: We're leveraging powerful A100 GPUs to accelerate your work. User-Centric Design: Whether you're a beginner or a pro, the platform is designed to cater to all skill levels. Collaboration: Built-in features to make team collaboration effortless. What's in the Survey? The survey contains questions about your current challenges, the tools you use, and what you'd love to see in an ML platform. It should only take about 5-10 minutes to complete. Thank You Gift As a small token of our appreciation, we're offering exclusive early access to the platform for selected participants. Don't miss this chance to be among the first to experience what we're building! 📝 Click Here to Take the Survey Your feedback is crucial for us to create a tool that we hope will make a significant positive impact in the machine learning community. Thank you for taking the time to read this post and participate in our survey. Cheers, The ML Workbench Team submitted by /u/nonononottodayorever [link] [comments]  ( 9 min )
    [P] Video Event Detection
    Hi, I'm looking to create a model that given a sequence of frames from a video, returns a probability distribution over a set of events that may have occurred in those frames (probably 5 - 10 events). The training data will consist of video and hand labelled frame index/event pairs. I'm not too concerned about handling simultaneous events. It would be super helpful for some suggestions on a model architecture that would yield the best results and/or good papers/examples that achieve something similar. Thanks! submitted by /u/Dredgefort [link] [comments]  ( 9 min )
    [P] Retrieval augmented generation with OpenSearch and reranking [Video tutorial]
    I created a video tutorial that tries to demonstrate that semantic search (using embeddings) is not always necessary for RAG (retrieval augmented generation). It was inspired by the following Cohere blog post: https://txt.cohere.com/rerank/ I code up a minimal RAG pipeline: OpenSearch -> Rerank -> Chat completion (without using Langchain or similar libraries) and then see how it performs on various queries. Hope some of you find it helpful. Feel free to share any feedback@ Video link: https://youtu.be/OsE7YcDcPz0 submitted by /u/mildlyoverfitted [link] [comments]  ( 9 min )
    [R] Hacking an NLP benchmark: How to score 100 points on AMR parsing
    AMR parsing is a fun task where researchers map texts onto little graphs that explicate their meaning, so called Abstract Meaning Representations (AMRs). While arguably not the top NLP benchmark regarding popularity, research has been active for the last 10 years, including at major NLP conferences such as ACL/NAACL/EACL/EMNLP etc. Funnily, I recently found some vulnerabilities in the evaluation protocol, and if we exploit these vulnerabilities, we can get the highest score on the benchmark. To get an overview over the issue (without understanding AMR), imagine a cooking contest that takes place regularly, say, once a year. In all events, we have the same judge, participants are amateurs, meals are scored on 0 to 100, with 100 meaning “it can’t possibly get better”. Over the years, the …  ( 10 min )
    [D] Looking for an article related to machine learning in medicine to be presented at a journal club
    Hi all, I'm curious if anyone has a stand-out article they believe would prompt a lively discussion in a journal club I have coming up. Something that may have people take sides, or maybe a recent breakthrough in the ML space as it relates to clinical/health care. ​ Thanks! submitted by /u/veilofosiris [link] [comments]  ( 9 min )
    [R] Think before you speak: Training Language Models With Pause Tokens
    Paper - https://arxiv.org/abs/2310.02226 submitted by /u/MysteryInc152 [link] [comments]  ( 8 min )
    [P] Good models to use for multimodal object detection when both the modalities are image based or some object detection models which support ensembling out of the box like Yolov5?
    So basically I have a dataset with images of vehicles in top down view in both RGB and IR, what are some models I can use for both unimodal and multimodal object detection to compare their performance. Links to GitHub repos would be helpful. Thanks submitted by /u/Xyber5 [link] [comments]  ( 9 min )
    [P] Using pre-trained models as features?
    Hey everyone! Currently, I am working on a project around music emotion classifcation/regression model. Basically I am trying to predict a score to each emotion on a given song. The problem is that my dataset has quite imbalanced scores (y). Most scores are centered around a certain score range. Therefore, having difficulties predicting scores that are further away of the mean values. I had this idea to bring in pre-trained (on other datasets and problems) audio classification models into this as there are a bunch of good performing pre-trained classification models out there already. The prediction of these pre-trained models should be used as features (e.g. prediction of genre, instrument etc) beside the original spectorgram in my model. I know this won't solve the problem of imbalances in the scores but I thought maybe this could improve the performance as the model would have more features to work with. Does this make sense? I appreciate any input. submitted by /u/Kniggi [link] [comments]  ( 9 min )
    [D] LOMO underrated
    Does anyone have an idea why the LOMO optimizer (low memory optimizer) which was released a few months ago is not widely available and everyone still uses either Adam or SGD? While the paper looks really promising submitted by /u/RedMoula [link] [comments]  ( 9 min )
    [P] Camera based monitoring of infant's breathing
    Hi! I recently have seen systems that monitor breathing rate of an infant through camera. I have read several articles on that topic, where people used things like 3D camera, RGB or Interferometric Radar Sensor. Do you guys have any idea on how to accurately measure this? submitted by /u/kaina_m [link] [comments]  ( 9 min )
    [R] Towards Self-Assembling Artificial Neural Networks through Neural Developmental Programs
    submitted by /u/hardmaru [link] [comments]  ( 8 min )
    [D] How Do You Track Projects in a Scaling ML Team?"
    I am part of a Machine Learning team that has experienced significant growth recently. When we were a small team, tracking projects was straightforward. However, as the team has expanded, it's become increasingly challenging to keep track of everything. We are part of a larger corporation, so we have access to tools for creating epics and boards. However, these corporate tools are too generic and don't provide the level of detail I need for internal management. Specifically, I'm looking for a way to track model versions, dataset versions, and the overall status of our projects. I'd also like to be able to assign team members to projects. Currently, we use a MIRO board, but it's disorganized and difficult to read and update. I'd love to hear what tools or strategies you've used for similar situations, especially since our team is expected to grow even more, making tracking increasingly complex. submitted by /u/Spiritual_Narwhal649 [link] [comments]  ( 9 min )
  • Open

    Lemniscate functions
    In the previous post I said that you could define the inverse sine as the function that gives the arc length along a circle, then define sine to be the inverse of the inverse sine. The purpose of such a backward definition is that it generalizes to other curves besides the circle. For example, it […] Lemniscate functions first appeared on John D. Cook.  ( 5 min )
    Generalized trigonometry
    In a recent post I mentioned in passing that trigonometry can be generalized from functions associated with a circle to functions associated with other curves. This post will go into that a little further. The equation of the unit circle is and so in the first quadrant The length of an arc from (1, 0) […] Generalized trigonometry first appeared on John D. Cook.  ( 5 min )
  • Open

    A Mine-Blowing Breakthrough: Open-Ended AI Agent Voyager Autonomously Plays ‘Minecraft’
    For NVIDIA Senior AI Scientist Jim Fan, the video game Minecraft served as the “perfect primordial soup” for his research on open-ended AI agents. In the latest AI Podcast episode, host Noah Kravitz spoke with Fan on using large language models to create AI agents — specifically to create Voyager, an AI bot built with Read article >  ( 6 min )
    How AI Helps Fight Wildfires in California
    California has a new weapon against the wildfires that have devastated the state: AI. A freshly launched system powered by AI trained on NVIDIA GPUs promises to provide timely alerts to first responders across the Golden State every time a blaze ignites. The ALERTCalifornia initiative, a collaboration between California’s wildfire fighting agency CAL FIRE and Read article >  ( 6 min )
  • Open

    LLMs May Be The Trojan Horse That Modernizes Software Development
    submitted by /u/geekteam6 [link] [comments]
    Why PepsiCo is powering your snacks with AI
    Using AI to improve Cheetos? That's something PepsiCo has experimented with. On today’s POLITICO Tech, Athina Kanioura, chief strategy and transformation officer for PepsiCo, says that using AI to make employees faster and more efficient hasn’t led PepsiCo to replace human workers as many fear. And why the company has determined that in some jobs the technology is simply off limits. Listen to the interview here: https://politico-tech.simplecast.com/episodes/why-pepsico-is-powering-your-snacks-with-ai submitted by /u/smo279 [link] [comments]
    New Paper: Enabling Language Models to Implicitly Learn Self-Improvement From Data
    LLMs keep getting more capable at generating natural language. But there's always room for improving the quality and alignment of their responses. Typically this requires lots of human effort to collect more training data. So researchers are exploring ways for models to self-improve without human involvement. Many methods use prompting - giving the LLM instructions to critique and refine its responses. But coming up with comprehensive prompts is challenging. The new approach proposed, called PIT, lets models learn self-improvement implicitly from human preference data instead. It reformulates reinforcement learning to maximize the gap between an original response and improved response conditioned on the original. This taps into the implicit guidance in the preference data on what constitutes better quality, so no manual rubrics are needed. PIT uses curriculum reinforcement learning - first improving easy references, then switching to the LLM's own samples. Experiments on real and synthetic datasets show PIT significantly outperforms prompting methods like Self-Refine. It improved response quality 7-34% across conditions without any human involvement. This demonstrates a promising direction for LLMs to align better with human preferences autonomously as they learn from experience. No need for human bottlenecks when expanding to new domains or underserved use cases. Very cool! TLDR: New method PIT enables LLMs to implicitly learn to refine themselves from human preference data, no prompts needed. Big improvement over prompting approaches. Full Summary Arxiv is here: https://arxiv.org/abs/2310.00898 submitted by /u/Successful-Western27 [link] [comments]
    $5k in grants or $250k funding for AI startups. Backed by OG's
    AI Grant is offering $5k in grants or $250k in funding for AI startups. The program is backed by OG's AI Grant, an accelerator for AI startups. The grant includes an uncapped SAFE investment of $250,000 for AI-native product startups, $350,000 in Azure credits, a summit in San Francisco with advisors and founders, and various other startup benefits and credits. The program was created by Nat Friedman and Daniel Gross. Applications for Batch 3 will open in a few months, but early applications are accepted. The program is open to anyone, and it is looking for companies or projects that leverage AI models in a useful or engaging way. Source : https://aigrant.com/ submitted by /u/NuseAI [link] [comments]
    AI will teach everyone to read and write. It's already begun.
    https://www.imagineworldwide.org/ "What is Child-Directed, Tech-Enabled Learning? Children drive their own learning, at their own pace, using software that provides a complete, research-based curriculum and pedagogy. Adults play a supportive, facilitative role. The software is delivered to the learner on a tablet, without connectivity, and charged by solar power or other appropriate energy sources... With hundreds of millions of children out of school or lacking access to effective schooling, this model can provide every child, everywhere access to learning. Solutions can work without internet access or grid power. Adults play facilitative, rather than instructional, roles. The annual unit cost of the learning solution is less than $7 per child and declining. This includes hardware, software, accessories, power, shipping, and implementation support from Imagine." submitted by /u/Georgeo57 [link] [comments]
    AI is replacing customer service jobs across the globe
    Artificial intelligence (AI) is replacing customer service jobs around the world, with chatbots being used to interact directly with customers and solve problems independently. This shift is expected to have a profound effect on economies, particularly in countries like India and the Philippines where call centers provide millions of jobs. While some argue that AI will provide support to remaining call center workers and improve job satisfaction, others warn that it could lead to job losses and a need for workforce adaptation. The use of AI software tools in call centers has shown potential for improving productivity and customer satisfaction. Source : https://www.washingtonpost.com/technology/2023/10/03/ai-customer-service-jobs/ submitted by /u/NuseAI [link] [comments]
    Female-founded AI startups win just 2% of funding deals in UK
    Female-founded AI startups in the UK account for just 2% of funding deals over the past decade, according to a report by the Alan Turing Institute. When female-founded companies do secure funding, they raise an average of £1.3m per deal, compared to £8.6m raised by all-male founder teams. The report highlights the urgent need for gender balance in AI investment, as the industry is predicted to grow significantly in the coming years. Recommendations to improve gender balance include improving recruitment, monitoring investment practices, and diversifying the ecosystem. There is an increasing demand for generative AI products, with leading tech companies investing heavily. Gender diversity gaps and uneven progress rates for ethnic and racial groups are observed across investment firms. AI products have shown biases, such as passport checkers working less efficiently with darker skin and tools reinforcing gender stereotypes. In 2019, a UN agency found that assigning female genders to digital assistants like Siri and Alexa perpetuated harmful gender biases. Source : https://www.theguardian.com/technology/2023/oct/04/female-founded-ai-startups-win-just-2-of-funding-deals-in-uk submitted by /u/NuseAI [link] [comments]
    I used Riffusion (Stable Diffusion, but for music) to turn my own music into "jazz", "Radiohead", "Muse" or "Nirvana" songs, I'm amazed by the results
    submitted by /u/cI_-__-_Io [link] [comments]
    Visa Announces $100 Mn Fund for Generative AI Companies
    submitted by /u/Agitated-Spell3979 [link] [comments]
  • Open

    My Impressions (and Application) of the Heidelberg Laureate Forum 2023
    This September, I had the chance to attend the Heidelberg Laureate Forum (HLF) for the second — and probably last — time. The HLF is an incredible experince for young researchers: Mirroring the Lindau Nobel Laureate Meetings, the organizers invite laureates from math and computer science together with young researchers pursuing their undergraduate, graduate or post-doc studies. In this article, I want to share impressions and encourage students to apply next year! The post My Impressions (and Application) of the Heidelberg Laureate Forum 2023 appeared first on David Stutz.  ( 7 min )
  • Open

    Simplify medical image classification using Amazon SageMaker Canvas
    Analyzing medical images plays a crucial role in diagnosing and treating diseases. The ability to automate this process using machine learning (ML) techniques allows healthcare professionals to more quickly diagnose certain cancers, coronary diseases, and ophthalmologic conditions. However, one of the key challenges faced by clinicians and researchers in this field is the time-consuming and […]  ( 11 min )
    Create an HCLS document summarization application with Falcon using Amazon SageMaker JumpStart
    Healthcare and life sciences (HCLS) customers are adopting generative AI as a tool to get more from their data. Use cases include document summarization to help readers focus on key points of a document and transforming unstructured text into standardized formats to highlight important attributes. With unique data formats and strict regulatory requirements, customers are […]  ( 9 min )
    Automate prior authorization using CRD with CDS Hooks and AWS HealthLake
    Prior authorization is a crucial process in healthcare that involves the approval of medical treatments or procedures before they are carried out. This process is necessary to ensure that patients receive the right care and that healthcare providers are following the correct procedures. However, prior authorization can be a time-consuming and complex process that requires […]  ( 7 min )
  • Open

    Scalable spherical CNNs for scientific applications
    Posted by Carlos Esteves and Ameesh Makadia, Research Scientists, Google Research, Athena Team Typical deep learning models for computer vision, like convolutional neural networks (CNNs) and vision transformers (ViT), process signals assuming planar (flat) spaces. For example, digital images are represented as a grid of pixels on a plane. However, this type of data makes up only a fraction of the data we encounter in scientific applications. Variables sampled from the Earth's atmosphere, like temperature and humidity, are naturally represented on the sphere. Some kinds of cosmological data and panoramic photos are also spherical signals, and are better treated as such. Using methods designed for planar images to process spherical signals is problematic for a couple of reasons. Firs…  ( 92 min )
  • Open

    Why DQN method is only suitable for small discrete action space? What is the issue if action space is large and continous?
    submitted by /u/aabra__ka__daabra [link] [comments]
    Up to date Metaworld documentation
    Hello everyone, I want to start experimenting with the domain of multi-tasking and meta-learning, thus I pip installed metaworld which is currently on version 2.0.0 if I'm not mistaken. I wanted to ask in case anybody knows, if there's any recent updated documentation, because the farama foundation on GIthub which is probably responsible for maintaining the Metaworld, has outdated code and documentation. (for example, presented code on Github's README has the command env.step(a) which returns 4 values instead of 5 that newer version outputs). From what I understand, they gather contributors for a big push regarding code and documentation on GItHub, where they will make up things up to date again but this announcement was 7 months ago. Sorry for the potentially wrong format of this question-post, I'm relatively new to reddit. I would appreciate any further knowledge on this topic and thanks everyone who's taking the time to read it! ​ Metaworld Distribution from Farama Foundation on Github: https://github.com/Farama-Foundation/Metaworld submitted by /u/South_Book_5625 [link] [comments]
    The future of game testing is here, and it is powered by Artificial Intelligence! 🔥
    Hi everyone! We used our opensource library SheepRL 🐑 and our PyTorch implementation of DreamerV3 on Crafter, an open-world survival game, featuring randomly generated 2D worlds, in which players have the freedom to explore a large and expansive map and need to forage for food, collect materials, build tools and find shelter. Here is a short video 👉 https://youtu.be/7XEBT2msUUQ In open-world games, ensuring they are playable and bug-free is crucial, but is becoming increasingly difficult and time-consuming using manual game testing. Maximizing exploration using Reinforcement Learning is extremely useful for testing games at scale, because of the wide variety of gameplay scenes the player may encounter. Why is the test on Crafter so interesting for game testing? Because Crafter evaluates a large number of general capabilities related to the RL agent, like strong ability to generalise (new generated maps for each episode), to deal with partial observability (each input image reveals only a small part of the world) and to long-term reasoning and survival. These abilities are very useful for testing games at scale, providing developers with insights to optimise gameplay and player experience. The future of game testing is here, and it is powered by Artificial Intelligence! 🔥 --- ❌ Are you interested in joining the project community? Get in touch 👉 https://github.com/Eclectic-Sheep/sheeprl ❌ SheepRL 🐑 is open-source, fully written in PyTorch and accelerated with LightningFabric - by Lightning AI. Feel free to use it for your AI projects, and if you want to contribute, we are more than happy to accept your pull requests! ❤️ submitted by /u/Manu_Orobix [link] [comments]
    Can I use Continuous algorithms (e.g. TD3) for Discrete Action spaces?
    My environment has hybrid action spaces and I was wondering if I can use continuous algorithms for discrete action spaces. I'm asking this because, well, agent can't learn and I'm trying to find the source of error. I was wondering if this was the source of problem. ​ My Assumptions On Solving This Problem: - Discrete is subspace of continuous, thus continuous algorithms will be able to handle discrete action spaces as well. - A non-hybrid action-space algorithm will be simpler than hybrid-action-space algorithms. ​ Method (I'm only describing the discrete action here): - Use TD3 as the training algorithm. No modification from the original training code. TD3 algorithm has been verified on Pendulum and other environments created for unit test purposes. - Policy network outputs the a…

  • Open

    Video Game Voice Actors Are Ready to Strike over AI. Here’s Why
    Video game voice actors are prepared to go on strike over the use of AI in game development. The current contract negotiations between the Screen Actors Guild-American Federation of Television and Radio Artists (SAG-AFTRA) and video game companies have stalled, with the major issues being pay raises and the use of AI to alter or generate actors' performances. SAG-AFTRA wants protections for its members to ensure their work is not stolen or replaced by AI. If negotiations don't progress, voice actors, stunt artists, and motion capture performers could potentially go on strike, leading to delays in game releases and recasting of beloved performers. The voice actors' strike in 2016 resulted in improvements to pay, and now they are prepared to strike again to fight for their rights. Video game performances are often seen as assets to be extracted and inserted into games, rather than recognizing the humanity and quality of life of the performers. The use of AI in game development raises concerns about how companies will use advances in generative AI to steal work or put performers out of a job. SAG-AFTRA wants transparency, consent, and compensation when it comes to the use of AI in games. Members of SAG-AFTRA have voted in favor of authorizing a strike, meaning voice actors, stunt artists, and motion capture performers could potentially join the picket line if negotiations don't progress. The strike could lead to delays in upcoming game releases and the recasting of performers if companies refuse to meet the union's demands. The fight for voice actors' rights is an existential one, as they want to retain the rights to their own voices and images and achieve wages that keep up with inflation Source : https://kotaku.com/sag-aftra-strike-voice-actor-spider-man-ai-union-1850874117 submitted by /u/NuseAI [link] [comments]  ( 9 min )
    [Question] Any 3X AI?
    Wanted to see if there are any 3X AI generated images available? I’m looking to see how I could use AI to generate images for my website. submitted by /u/IamMoe8868 [link] [comments]  ( 8 min )
    TikTok ran a deepfake ad of an AI MrBeast hawking iPhones for $2
    TikTok ran an ad featuring a deepfake of MrBeast offering iPhone 15 Pros for $2. AI-generated deepfake content is becoming more pervasive on social media platforms. Platforms like TikTok are facing challenges in moderating and handling the rise of AI deepfakes. MrBeast raised concerns about the ability of social media platforms to handle AI deepfakes. TikTok removed the ad and associated account for policy violations. Unauthorized AI-generated content featuring celebrities is a growing problem in platform advertising. The issue is expected to worsen as AI technology improves and becomes more accessible. Transparency and disclosure are crucial in AI-generated ad content featuring celebrities. TikTok is aware of the pervasiveness of AI-generated content on its platform and is taking steps to address it. Source : https://www.businessinsider.com/tiktok-ran-deepfake-ad-mrbeast-as-ai-generated-content-spreads-2023-10 submitted by /u/NuseAI [link] [comments]  ( 9 min )
    Infinitia will apparently let you create your own AI enabled social simulations
    Came across this upcoming game which supposedly let's you create your own worlds and characters to live in the world...they also released a research paper explaining how they're doing it, using LLMs in all sorts of ways, primarily for reasoning and language. I think it could be a pretty fun take on passive games, just populating a world with your characters, checking up on them occasionally, putting them in weird situations lol. infinitia.ai for those who wanna check it out The NPCs do seems to be acting in an interesting way, as i saw in this video they posted on twitter... https://twitter.com/infinitia_app/status/1707102187518628245 ​ Watchall think? Another smallville clone? or something interesting.... submitted by /u/SeaJeweler3723 [link] [comments]  ( 9 min )
    Efficient AI design of robots.
    submitted by /u/DrJosh [link] [comments]  ( 8 min )
    From Stone to Silicon: The Odyssey of Humanity and Technology
    submitted by /u/Einsof__ [link] [comments]  ( 8 min )
    Don't Worry, AI Cannot Takeover the World, It Will Run Out of Battery
    The article discusses the importance of batteries in AI technology and how they limit the capabilities of AI robots. It explores the challenges of current battery technology and the need for better solutions. The article emphasizes the significance of developing ideal batteries that can provide long-lasting power without degradation. Source : https://notes.arkinfo.xyz/p/dont-worry-ai-cannot-takeover-the submitted by /u/NuseAI [link] [comments]  ( 9 min )
    GPT-4 outperforms its rivals in new AI benchmark suite GPT-Fathom
    ByteDance and the University of Illinois researchers have developed an improved benchmark suite with consistent parameters, called GPT-Fathom, that indicates GPT-4, the engine behind the paid version of ChatGPT, significantly outperforms leading LLMs, including its biggest competitor, Claude 2. For the latest advancements in AI, look here first. ​ https://preview.redd.it/v4fo8zser0sb1.png?width=1292&format=png&auto=webp&s=7e29fe9ac1af3efcb936ee61e9202717eed7e702 GPT-Fathom's breakthrough The new benchmark suite, GPT-Fathom, addresses consistent settings issues and prompt sensitivity, attempting to reduce inconsistencies in LLM evaluation. In a comparison using GPT-Fathom, GPT-4 outperformed over ten leading LLMs, crushing the competition in most benchmarks, and showing significant performance leaps from GPT-3 to its successors. Performance specifics The gap in performance was especially pronounced against Claude 2, ChatGPT's biggest rival. GPT-4's Advanced Data Analysis model exhibited superior performance in coding, giving it an edge as compared to LuckLlama 2, the current best-performing open-source model. Llama 2-70B showed comparable or better performance than gpt-3.5-turbo-0613 in safety and comprehension but displayed worse performance in "Mathematics", "Coding", and "Multilingualism". The seesaw effect The research team noted a 'seesaw effect' where an improvement in one area can lead to degradation in another. For instance, GPT-4 saw a performance drop on the Mathematical Geometry Simple Math (MGSM) benchmark, despite improving its performance significantly on the text comprehension benchmark DROP. (source) P.S. If you like this kind of analysis, I write a free newsletter that tracks the most relevant news and developments in AI. Professionals from Meta, Google, and OpenAI are already reading it. submitted by /u/AIsupercharged [link] [comments]  ( 9 min )
    Runway has launched Gen 2 Director mode. The speed at which this company works is Insane
    submitted by /u/First_Development101 [link] [comments]  ( 8 min )
    How Ai Could Upend Foreign Policy - An Interview with Ian Bremner and Mustafa Suleyman
    submitted by /u/finger_puppet_self [link] [comments]  ( 8 min )
    CGPT-4 on some ways "prompt/response" posts advance AI
    Prompt: Explain why posting prompts and your responses speeds up AI development in the following ways: It allows us to analyze responses to prompts that we would probably never have thought of. It allows us to share ideas that the open source community would advance. It allows us to better democratize the process of alignment, bringing in as many people as possible. CGPT-4: Posting prompts and AI responses serves multiple functions that are beneficial to the progress of AI development. First off, it exposes AI to a myriad of queries that developers might not have otherwise considered. This wide range of prompts allows the machine to improve its natural language processing abilities, making it more versatile and effective in interpreting and responding to human language. The more dive…  ( 9 min )
    A.I Makes a Video game on the App Store
    submitted by /u/usmansid98 [link] [comments]  ( 8 min )
    Infinite context windows? Streaming LLMs can be extended to infinite sequence lengths without any fine-tuning.
    LLMs like GPT-3 struggle in streaming uses like chatbots because their performance tanks on long texts exceeding their training length. I checked out a new paper investigating why windowed attention fails for this. By visualizing the attention maps, the researchers noticed LLMs heavily attend initial tokens as "attention sinks" even if meaningless. This anchors the distribution. They realized evicting these sink tokens causes the attention scores to get warped, destabilizing predictions. Their proposed "StreamingLLM" method simply caches a few initial sink tokens plus recent ones. This tweaks LLMs to handle crazy long texts. Models tuned with StreamingLLM smoothly processed sequences with millions of tokens, and were up to 22x faster than other approaches. Even cooler - adding a special "[Sink Token]" during pre-training further improved streaming ability. The model just used that single token as the anchor. I think the abstract says it best: We introduce StreamingLLM, an efficient framework that enables LLMs trained with a finite length attention window to generalize to infinite sequence length without any fine-tuning. We show that StreamingLLM can enable Llama-2, MPT, Falcon, and Pythia to perform stable and efficient language modeling with up to 4 million tokens and more. TLDR: LLMs break on long convos. Researchers found they cling to initial tokens as attention sinks. Caching those tokens lets LLMs chat infinitely. Full summary here Paper link: https://arxiv.org/pdf/2309.17453.pdf submitted by /u/Successful-Western27 [link] [comments]  ( 9 min )
    Where do I produce free intro and outro AI music for my Podcast for free.
    I am starting a podcast on Psychology and Philosophy submitted by /u/21bce [link] [comments]  ( 8 min )
    BackerKit Will Restrict the Use of AI Art
    Crowdfunding site BackerKit has announced a new policy that restricts the use of solely AI-generated content on its platform. The policy aims to address concerns regarding ownership of content, ethical sourcing of data, and compensation for the process of creating content. Projects that lack a minimum requirement of human input will not be allowed to crowdfund on the BackerKit site. There is some flexibility with AI generative fill and the use of AI transcription services, but a high level of human input is required to satisfy the policy. BackerKit will automatically exclude all content uploaded by creators for their projects from AI training in support of this policy. The new restrictions will go into effect on October 4, giving creators time to alter their projects if they are using AI-generated images and text. Source : https://gizmodo.com/backerkit-ai-art-new-policy-crowdfunding-generative-1850891882 submitted by /u/NuseAI [link] [comments]  ( 9 min )
    One-Minute Daily AI News 10/2/2023
    iPhone designer Jony Ive is reportedly talking to OpenAI CEO Sam Altman about making an AI hardware device.[1] Visa announced today that it plans to invest $100 million in companies developing generative AI technologies and applications “that will impact the future of commerce and payments.”[2] More than 40% of labor force to be affected by AI in 3 years, Morgan Stanley forecasts. [3] Tom Hanks: Don't fall for "AI version of me" promoting dental plan.[4] Sources: [1] https://www.businessinsider.com/chatgpt-head-iphone-designer-jony-ive-ai-device-openai-report-2023-9?amp [2] https://techcrunch.com/2023/10/02/visa-earmarks-100m-to-invest-in-generative-ai-companies/ [3] https://www.cnbc.com/2023/10/02/more-than-40percent-of-labor-force-to-be-impacted-by-ai-in-three-years-morgan-stanley-forecasts.html [4] https://www.cbsnews.com/amp/news/tom-hanks-ai-version-of-me-promoting-dental-plan/ submitted by /u/Excellent-Target-847 [link] [comments]  ( 9 min )
  • Open

    [D] What are some effective dimensionality reduction (unsupervised feature selection) techniques for a high dimensional, sparse dataset?
    I am considering comparing mutual information scores, but I also don't think I understand MI well enough. For example, I(X;Y) = H(X) + H(Y) - H(X,Y). To me, visualizing H(X) and H(Y) as venn diagrams and H(X,Y) as the information from both X, Y (like an overlapping venn diagram) makes me think that when X, Y are disjoint, then MI is 0 and when X, Y overlap completely, then the MI score will be high. So, I'm thinking that a high MI value is "bad" since this means X, Y would be redundant. I am not sure if my understanding here is correct. Another method I have tried is to binarize the data for each feature (represented as rows in my dataset) using "present" (1) and "absent" (0). The main issue I have run into doing this is that I am trying to then create a distribution to compare the fea…  ( 10 min )
    [D] Best interface to use LLMs for code: Chat or completion?
    Hi everyone, I am quite interested in understanding what are the feedback from the community in terms of interface to leverage LLMs for code productivity. Because LLMs tend to do mistake I have mostly used Chat-like interfaces, like ChatGPT, as they allow to interact with the model and converge to a conclusion. I haven't used Copilot for a while but my feeling was that it could do some boilerplate correctly but then it quickly started suggesting code that would be misleading and could actually hurt productivity. It might have changed since then but that was my feeling back then. What is your favorite option and why? View Poll submitted by /u/Separate-Still3770 [link] [comments]  ( 9 min )
    [D] ML input data has to be derived from a larger dataset
    Hello everyone. I am curious to know if anyone has encountered a ML problem like this and if so, I seek your advice. Usually in ML classification such as the IRIS dataset, each row represents a sample and each column a parameter, right ! My problem is that my ML classification parameters have to be derived from a range of values (parent data). I have taken mean of the parent values to generate the parameters for the ML input data. This results in lower classification accuracies using Random forest and XGBoost. Has anyone encountered a similar situation like this where the data has to be generated from a range of other datasets? Is there any other way to do this? I did not find any papers or articles from the web so just asking. I can generate additional parameters from other statistics such as median, standard deviation etc. which can improve the classification accuracy but can make interpretation of the results a little weird, domain wise. I wish to avoid this if possible. submitted by /u/notmyfault7676 [link] [comments]  ( 9 min )
    [D] Book review for Meta's ML Design interview? Machine Learning System Design Interview (by Ali Aminian and Alex Xu)
    I'm preparing for the ML system design interview for Meta, and I searched for various resources. This book (ML System Design Interview (by Ali Aminian & Alex Xu)) seems like a solid structured resource that covers solutions to case studies in detail. Has anyone used it to prepare for Meta's ML System Design interview? Thoughts? Khang's book doesn't seem to have great reviews. Chip Huyen's book (Designing Machine Learning Systems: An Iterative Process for Production-Ready Applications) doesn't seem very focused on interview prep?? Also, happy to hear about other cool resources to prepare. Thanks very much! submitted by /u/irEFrienfk [link] [comments]  ( 9 min )
    [R] Open X-Embodiment: Robotic Learning Datasets and RT-X Models - DeepMind 2023 - RT-X exhibits positive transfer and improves the capabilities of multiple robots by leveraging experience from other platforms!
    Blog: https://www.deepmind.com/blog/scaling-up-learning-across-many-different-robot-types https://robotics-transformer-x.github.io/ here you can also find the Datasets and Code! Paper: https://robotics-transformer-x.github.io/paper.pdf Abstract: Large, high-capacity models trained on diverse datasets have shown remarkable successes on efficiently tackling downstream applications. In domains from NLP to Computer Vision, this has led to a consolidation of pretrained models, with general pretrained backbones serving as a starting point for many applications. Can such a consolidation happen in robotics? Conventionally, robotic learning methods train a separate model for every application, every robot, and even every environment. Can we instead train “generalist” X-robot policy that can be adapted efficiently to new robots, tasks, and environments? In this paper, we provide datasets in standardized data formats and models to make it possible to explore this possibility in the context of robotic manipulation, alongside experimental results that provide an example of effective X-robot policies. We assemble a dataset from 22 different robots collected through a collaboration between 21 institutions, demonstrating 527 skills (160266 tasks). We show that a high-capacity model trained on this data, which we call RT-X, exhibits positive transfer and improves the capabilities of multiple robots by leveraging experience from other platforms. https://preview.redd.it/oxzutrhtb1sb1.jpg?width=1693&format=pjpg&auto=webp&s=37b8b1dbf5f489dc2c8eaca4d15cb9c32ebc2660 https://preview.redd.it/ldsiwshtb1sb1.jpg?width=1494&format=pjpg&auto=webp&s=fdbf0f91c705acf11bff854f6d6af82dddd47021 https://preview.redd.it/ikk18jitb1sb1.jpg?width=1693&format=pjpg&auto=webp&s=e50b443dc4b0266a0480d54c4f92a0b708485797 https://preview.redd.it/t5wmciitb1sb1.jpg?width=1361&format=pjpg&auto=webp&s=2971fd645acb6dcbed2ca3522e311d0772c45964 ​ submitted by /u/Singularian2501 [link] [comments]  ( 9 min )
    [D] Biggest problems with ML in industry?
    For all my corporate ML engineers I have a question, what are the most annoying / biggest problems you face when developing/deploying ML in industry? This can be anywhere from data, to tuning, to even MLOPS. submitted by /u/hai_cben [link] [comments]  ( 9 min )
    [D] Difficulty with paper implementations on google colab
    I am not from CS background, my knowledge is from online courses and books. All of which used some variation of Jupyter notebook. My knowledge of code can be lacking sometimes, since I am not from CS background. I am trying to implement some computer vision paper codes on newer samples. I understand the papers, and the underlying mechanisms. However, I fail to decipher the codes provided with the associated github repository. Usually, these repository contains information on how to recreate the experiment on some specific data using shell. But I am using google Colab for this purpose, as I don't have access to GPU, and I found it impossible to recreate the experiments in the google Colab, using shell commands, let alone extend it to newer samples. I would appreciate some help in this regard, I haven't done this before, and there aren't really any tutorial/resource on how to do this. Ideally, what I am trying to do is separate the model, input some images, get the output, and interpret it. I am stuck, and I would really appreciate some help or advice in this regard. Right now I am trying to work with this paper, meta ood I would appreciate any help/advice/resource anything. I feel very lost. Thanks in Advance. submitted by /u/franticpizzaeater [link] [comments]  ( 9 min )
    Repurposing a personal desktop computer [P]
    Hello! I'm debating turning my old desktop (old CPU but relatively new GPU 3980 or 90) into a ML box that I can remote into. I'm sure people here have done something similar and I was wondering if anyone could point me towards some resources for getting it off the ground/any pitfalls to avoid/suggestions. I'm an active data scientist researcher for my job and this would just be for fun side projects but I have some pretty glaring holes in my knowledge of computers (like the best way to set this up - should I uninstall windows install unbuntu or is windows fine?) Honestly I'm sure my ignorance will be pretty apparent from the questions I'm asking/not asking so any advice anyone has would be welcome! Thanks! Sorry if this is the wrong subreddit for this sort of thing. ​ submitted by /u/shebaiscool [link] [comments]  ( 9 min )
    [R] Generative memory: generative diffusion models are equivalent to modern Hopfield nets
    https://arxiv.org/abs/2309.17290 submitted by /u/LucaAmbrogioni [link] [comments]  ( 8 min )
    [D] Stuck in Automation of AI models
    Hello everyone! ​ I'm currently working on a project and have hit a roadblock in automating the deployment of my machine-learning models. Can anyone provide guidance on the best practices or tools for streamlining the deployment process? Specifically, I'm looking to create a seamless workflow where models can be easily uploaded, deployed on the cloud, and accessible through APIs. Any insights or advice would be greatly appreciated! ​ Automation!!! submitted by /u/homelander81 [link] [comments]  ( 9 min )
    [P] The Case of the Missing Masterpiece
    Hi, I just wanted to share an applied image classification problem that I worked on a few years ago: https://vdalv.github.io/2018/09/01/missingMasterpiece.html submitted by /u/vdalv [link] [comments]  ( 9 min )
    Need to build a XAI model to explain the behaviour of an IDS [P]
    Hello, I need help from someone that knows about XAI. I have to create a XAI model to intérprete the resulta of an AI model, an MLP, that works as an IDS classifier. I have no idea on how to do It and I have been completely blocked for 2.5 years. This is the final project of my career and I just don't know how to do It, and my tutor isn't very helpful. If anyone is able to help I would explain him what I have to do and would be very grateful. Thanks for your help submitted by /u/elMandarine [link] [comments]  ( 9 min )
    [D] Optimal scheduling tool with AI/ML recommendations
    Hello all, I'm trying to plan out for a new web platform development for workforce management but have little experience. We all know that hard coding can be done for general scheduling, including manager polling shifts based on labor category, staff assignments, conflt resolving, emergency scheduling, etc. But what I want to research to is....how can I ensure that one optimal schedule is automatically computed using AI/machine learning tools so that I don't have to go through the list of hard-coded generated schedules (I’m sure these will work fine, but still want to compute one ultimate schedule). submitted by /u/Playful-Bed-2183 [link] [comments]  ( 9 min )
    [R] Break-A-Scene: Extracting Multiple Concepts from a Single Image
    ​ Break-A-Scene: Given a single image with multiple concepts, annotated by loose segmentation masks, our method can learn a distinct token for each concept, and use natural language guidance to re-synthesize the individual concepts or combinations of them in various contexts. Project Page: https://omriavrahami.com/break-a-scene/ Code is publicly released! Abstract Text-to-image model personalization aims to introduce a user-provided concept to the model, allowing its synthesis in diverse contexts. However, current methods primarily focus on the case of learning a single concept from multiple images with variations in backgrounds and poses, and struggle when adapted to a different scenario. In this work, we introduce the task of textual scene decomposition: given a single image of a scene that may contain several concepts, we aim to extract a distinct text token for each concept, enabling fine-grained control over the generated scenes. To this end, we propose augmenting the input image with masks that indicate the presence of target concepts. These masks can be provided by the user or generated automatically by a pre-trained segmentation model. We then present a novel two-phase customization process that optimizes a set of dedicated textual embeddings (handles), as well as the model weights, striking a delicate balance between accurately capturing the concepts and avoiding overfitting. We employ a masked diffusion loss to enable handles to generate their assigned concepts, complemented by a novel loss on cross-attention maps to prevent entanglement. We also introduce union-sampling, a training strategy aimed to improve the ability of combining multiple concepts in generated images. We use several automatic metrics to quantitatively compare our method against several baselines, and further affirm the results using a user study. Finally, we showcase several applications of our method. ​ submitted by /u/sgd_is_all_you_need [link] [comments]  ( 9 min )
    [R] MIT, Meta, CMU Researchers: LLMs trained with a finite attention window can be extended to infinite sequence lengths without any fine-tuning
    LLMs like GPT-3 struggle in streaming uses like chatbots because their performance tanks on long texts exceeding their training length. I checked out a new paper investigating why windowed attention fails for this. By visualizing the attention maps, the researchers noticed LLMs heavily attend initial tokens as "attention sinks" even if meaningless. This anchors the distribution. They realized evicting these sink tokens causes the attention scores to get warped, destabilizing predictions. Their proposed "StreamingLLM" method simply caches a few initial sink tokens plus recent ones. This tweaks LLMs to handle crazy long texts. Models tuned with StreamingLLM smoothly processed sequences with millions of tokens, and were up to 22x faster than other approaches. Even cooler - adding a special "[Sink Token]" during pre-training further improved streaming ability. The model just used that single token as the anchor. I think the abstract says it best: We introduce StreamingLLM, an efficient framework that enables LLMs trained with a finite length attention window to generalize to infinite sequence length without any fine-tuning. We show that StreamingLLM can enable Llama-2, MPT, Falcon, and Pythia to perform stable and efficient language modeling with up to 4 million tokens and more. TLDR: LLMs break on long convos. Researchers found they cling to initial tokens as attention sinks. Caching those tokens lets LLMs chat infinitely. Full summary here Paper link: https://arxiv.org/pdf/2309.17453.pdf submitted by /u/Successful-Western27 [link] [comments]  ( 9 min )
    [D] Really good dataset for a Course Capstone
    Hey everyone! My friends and I are taking a Data Science course in our university. We are modestly versed in ML/DL techniques, and want to use everything we know on a really good capstone project for this course. We are looking for a dataset where we can demonstrate a nice variety of techniques to really blow the socks off our Professor. Ideally we'd like this to be stemming from something basic that most would consider "Data Science", as in something with a tabular dataset and elements of classification. Though we still want chances to bring in what we know from outside the course: for example, if there's images to supplement the dataset we could use Image Classification models or something multimodal to bring in more features, if there's natural language data then we could use LLMs to extract salient features etc. More importantly though, we want something whose exploration can be really motivated so it doesn't seem we're only in it for the ML aspect. Thank you! submitted by /u/Subject-Revolution-3 [link] [comments]  ( 9 min )
    [D] Competitiveness in ML research
    I've been diving deep into the world of machine learning research, and I'm genuinely baffled: how on Earth do some researchers seem to pump out paper after paper? I mean, there's only 24 hours in a day, right? Are academic minions (i.e. PhD students) doing all the heavy lifting? Or maybe some highly efficient workflows I'm not privy to? On a more serious note, I would like a career in ML, and the sheer volume and pace of these publications is making me feel a bit disheartened. How is this prolificity possible? Any words of encouragement or advice? submitted by /u/blabboy [link] [comments]  ( 9 min )
    [D] Why should I use a hosted/cloud VectorDB solutions over a serverless or vector store?
    Why the hell should i use cloud based or server hosted solution over a easy peasy servless variant like lancedb or even faiss vector store is enough for most of the use cases on small-medium I often see posts like "oh my stack is... pinecone Chroma weaviate_io" And they just ingest minisets of data, what the hell man submitted by /u/Dear_Bullfrog193 [link] [comments]  ( 9 min )
    [P] FontoGen: generating true-type fonts
    I'd like to share a project that I've spent a few weekends working on. FontoGen is an autoregressive encoder-only transformer model that's capable of generating true-type fonts. GitHub: https://github.com/SerCeMan/fontogen Weights: https://huggingface.co/SerCe/fontogen Blog post with more details: https://serce.me/posts/02-10-2023-hey-computer-make-me-a-font The project is largely an exploration of whether generating fonts natively, line by line, is possible. I'm not aware of any previous research that would achieve the same results for complete fonts previously. This is my first ML-specific project, and I would appreciate any feedback on the model architecture, and I'm also happy to answer any questions you may have. submitted by /u/SerCeMan [link] [comments]  ( 9 min )
    [D] What happens after removing the causal mask of LLaMA?
    The causal mask in LLaMA serves as a protective barrier to prevent information leakage. However, in certain tasks, leveraging information leakage can be a beneficial strategy for enhancing performance, particularly in tasks like token classification, such as Named Entity Recognition (NER). Interestingly, the paper titled "Label Supervised LLaMA Finetuning" (available at https://arxiv.org/abs/2310.01208) reveals a significant performance boost in token classification when the causal mask is removed. submitted by /u/seanlee97 [link] [comments]  ( 9 min )
    [R] RA-DIT: Retrieval-Augmented Dual Instruction Tuning
    New paper that proposes instruction-tuning with in-context retrieval-augmentation to improve SOTA LLMs in cases where access to large, external knowledge sources is needed. Tested on LLaMA 65B, 13B and 7B. https://arxiv.org/abs/2310.01352 submitted by /u/todpole3 [link] [comments]  ( 9 min )
    [D] How do you scale computational intensive Python scripts?
    Hey ML Community, I'm wondering how people currently go about scaling their Python programs? Lets say for instances you're doing batch inference using an LLM. Each prediction takes 2-3 minutes to process, how would you go about scaling that to make a million predictions? I'm asking this question because a few months back I started building a tool to quickly parallelize python functions across thousands of machines in the cloud. I'm focused on making the barrier to interact with the cloud extremely low and want to know all the core alternatives out there. Also, if you have any advice on starting a business I'd love to hear it. submitted by /u/Ok_Post_149 [link] [comments]  ( 9 min )
    [D] What is the highest quality automatic image captioning solution?
    I make very high quality Lora's and finetuned stable diffusion models. These models yield very good results, but more importantly they are very easy to use as I have always captioned my images as one would use natural spoken language (no weird booru tags and all that jazz). The most labor intensive processes in the workflow is image captioning. For example, my last project had almost 10000 images in the data set. Every single image was manually captioned by me as the quality of all automated solutions I tried is subpar and has too many accuracy issues. I have tried Blip auto captioning and LLava, but they still were not accurate enough for what I needed. I am hoping someone here can suggest a solution, if one exists, thanks. submitted by /u/no_witty_username [link] [comments]  ( 9 min )
    [D] (Interview Help) Do you know any good resources for interview case studies in the finance domain (especially dealing in loan and credit cards)
    I'm preparing for a data science interview and am looking for case study prep resources, especially for the financial domain (loans and credit cards). Mainly, I want to understand some good metrics for the financial domain, ways to break down the questions and create a rough data model, kinds of conditions to take into consideration (eg. Seasonality), kinds of effects that can be used expected (like opportunities and risks), etc. Any resources or help is greatly appreciated! submitted by /u/how_the_turn_tablez [link] [comments]  ( 9 min )
  • Open

    Help Restricting Actions
    Hello, I am new to RL, I am currently working on a school project that requires it. I am working on making a model to play a game very similar to wordle, so for the function of this post it may as well be wordle. Right now I am trying to get it to work with this gym https://github.com/zach-lawless/gym-wordle, and I will make my tweaks later. This gym has a multi discrete action space, which makes sense to me for a word, IDK if thats best. To validate words, it has its own exception type. I am trying to train this with stable_baselines3, but the exception keeps being raised, since it is trying to guess garbled words like "xcjhr". Is there a way I can validate actions before they are made so the model is restricted to only guessing valid words? Is there a better way to do this? It doesnt need to be the best, it really only has to sorta work. Any help is appreciated, thanks! submitted by /u/ClackHack [link] [comments]  ( 9 min )
    Looking For Advice on Training and Reward Functions
    Hi Everyone, I'm venturing into a new territory of Reinforcement Learning (RL) through a personal project, despite having a solid background in various other ML domains. I'm developing an RL agent to play Skyjo, a turn-based card game, and I'm encountering some challenges related to reward optimization and game-ending decisions by the agents. I'd appreciate any advice or insights you might have! Project Overview: Objective: Develop an RL model to play Skyjo competitively. Environment: Built using Gymnasium and Pytorch. Agents: Two agents working in tandem - one for card selection (discard/draw) and the other for action and location selection. Training: 4-8 agent instances play against each other. Repository: https://github.com/grantslewis/auto_skyjo Reward Structure: Small p…  ( 10 min )
    My frustration level with Torch/Keras/Tensorflow and DQNs is killing me
    RANT: I've tried every possible example I can get my hands on. I've looked at reference examples. I've looked at Medium articles. I've looked at stuff written by college freshmen. Every example I find for a DQN written either for torch or tensorflow (and either tf_agents or keras), seems to either have a nasty bug preventing it to work or such a severe memory leak that it is unusable. I tried Torch recently and was doing some simple gridworlds. It does fine for tiny gridworlds like 5x5. I decided to push it a little (not much at all) to a known 21x21 gridworld from recognized papers - reference example died and ran out of memory after 3000 episodes - I mean - really? 3000 episodes? I ran on CPU and gave it 64GB. I don't know how much memory this SHOULD take. I can do it in a Q-Table for…  ( 10 min )
    Advice to improve outcome on a turn-based strategy game
    Hello everyone, I'm a total beginner in the reinforcement learning (RL) community, and I would appreciate some advice on a problem I'm currently facing. I've created a simple 2D turn-based game with only movement at the moment (I will also add combat features when I have success with training an AI for the movements). Game The rules are simple : A grid of 14x40 (560 cells in total) 1 Agent with a limited number of Move Point (MP) 1 Target that does not move (atm) The agent can end its turn to get its MP back I already implemented a pathfinding algorithm using A* which works really well but I would like to train an AI to reach the target as fast as possible (turn-wise). Here is a simulation of a state : ​ https://preview.redd.it/0p5yijnb60sb1.png?width=442&format=png&auto=…  ( 10 min )
    Cleanba, our new distributed DRL platform is finally out 🤗
    submitted by /u/vwxyzjn [link] [comments]  ( 8 min )
  • Open

    DSC Weekly 3 October 2023
    Announcements Top Stories In-Depth The post DSC Weekly 3 October 2023 appeared first on Data Science Central.  ( 20 min )
    Generative AI Megatrends: ChatGPT can see, hear and speak – but what does it mean when ChatGPT can think?
    One of the most impressive generative AI applications I have seen is viperGPT. The image / site explains it best. The steps are: This example, earlier this year, showed the potential of multimodal LLMs And as of last week, that future is upon us ChatGPT can now see, hear & speak. What are the implications… Read More »Generative AI Megatrends: ChatGPT can see, hear and speak – but what does it mean when ChatGPT can think? The post Generative AI Megatrends: ChatGPT can see, hear and speak – but what does it mean when ChatGPT can think? appeared first on Data Science Central.  ( 20 min )
    Cracking the code: The rising demand for data scientists in various industries
    In the ever-evolving landscape of the digital era, the relentless quest for deriving actionable insights from a sea of information has become the cornerstone of innovation and strategy. As businesses and organizations strive to navigate the complex corridors of big data, the spotlight invariably falls upon the expertise of data scientists, the modern-day architects of… Read More »Cracking the code: The rising demand for data scientists in various industries The post Cracking the code: The rising demand for data scientists in various industries appeared first on Data Science Central.  ( 21 min )
    Generative AI megatrends: How many LLMs would you subscribe to?
    I recently subscribed to openAI GPT4 for the OpenAI Code Interpreter/Advanced data analytics. We are using it in our class at the University of Oxford.  Its really cool and we are also waiting the multimodal openAI features Recently, a well known AI critic said that he does not see how Generative AI companies could be… Read More »Generative AI megatrends: How many LLMs would you subscribe to? The post Generative AI megatrends: How many LLMs would you subscribe to? appeared first on Data Science Central.  ( 19 min )
    A few highlights of the Efficient Generative AI Summit (EGAIS)
    Large language models (LLMs) for generating text and vision models for generating images are notoriously inefficient. The larger they get, the more power hungry they become.   Kisaco Research in September hosted a one-day event in Santa Clara dedicated to the topic of generative artificial intelligence (GAI) efficiency, followed by a three-day Summit on Hardware and… Read More »A few highlights of the Efficient Generative AI Summit (EGAIS) The post A few highlights of the Efficient Generative AI Summit (EGAIS) appeared first on Data Science Central.  ( 21 min )
  • Open

    AI copilot enhances human precision for safer aviation
    Designed to ensure safer skies, “Air-Guardian” blends human intuition with machine precision, creating a more symbiotic relationship between pilot and aircraft.  ( 8 min )
  • Open

    Accelerate Foundation Models Research: Supporting a global academic research ecosystem for AI
    A diverse research ecosystem is essential to realizing the promise of AI. Accelerate Foundation Models Research aims to expand access to powerful models, engaging academics outside of computer science to pursue a broad range of important opportunities. The post Accelerate Foundation Models Research: Supporting a global academic research ecosystem for AI appeared first on Microsoft Research.  ( 10 min )
  • Open

    Meet the Maker: Robotics Student Rolls Out Autonomous Wheelchair With NVIDIA Jetson
    With the help of AI, robots, tractors and baby strollers — even skate parks — are becoming autonomous. One developer, Kabilan KB, is bringing autonomous-navigation capabilities to wheelchairs, which could help improve mobility for people with disabilities. The undergraduate from the Karunya Institute of Technology and Sciences in Coimbatore, India, is powering his autonomous wheelchair Read article >  ( 6 min )
    CG Geek Makes VFX Look Easy This Week ‘In the NVIDIA Studio’
    Releasing a 3D tutorial dubbed The Easiest VFX Tutorial Ever takes supreme confidence and the skills to back it up. Steve Lund a.k.a. CG Geek — the featured artist of this week’s In the NVIDIA Studio installment — has both in spades.  ( 8 min )
  • Open

    From graph theory to category theory
    Let G be a directed graph whose nodes are the positive integers and whose edges represent relations between two integers. In our first example we’ll draw an edge from x to y if x is a multiple of y. In our second example we’ll draw an edge from x to y if x ≥ y. […] From graph theory to category theory first appeared on John D. Cook.  ( 6 min )
    Test functions
    Test functions are how you can make sense of functions that aren’t really functions. The canonical example is the Dirac delta “function” that is infinite at the origin, zero everywhere else, and integrates to 1. That description is contradictory: a function that is 0 almost everywhere integrates to 0, even if you work in extended […] Test functions first appeared on John D. Cook.  ( 6 min )
    Groups vs Abelian groups: Pedantic or profound?
    This article will probably only be of interest to a small number of readers. Those unfamiliar with category theory may find it bewildering, and those well versed in category theory may find it trivial. My hope is that someone in between, someone just starting to get a handle on category theory, will find it helpful. […] Groups vs Abelian groups: Pedantic or profound? first appeared on John D. Cook.  ( 7 min )
  • Open

    DALL·E 3 system card
    No content preview  ( 1 min )

  • Open

    [Discussion] I didn't do well in Calculus III
    So I got an A in calculus three but I probably didn't deserve it since it was online and all I did was look up the answer and understand the problems given on the test. So I probably have a C level understanding. Will I be tested on calc 3 knowledge in machine learning or should I retake calc 3? submitted by /u/Glittering-Target-87 [link] [comments]  ( 9 min )
    [P] Hand keypoint detection
    Hello Reddit, I have a question regarding the right tool. I'm looking for a tool / model to detect hand-keypoints in a video stream of a person assembling stuff. I know OpenPose is a possible one, also Google MediaPipe. I’m not really getting along with OpenPose and MediaPipe don’t show really good results. In my project, I would like to detect hand keypoints in assembly scenarios. It would be ok to use 2 cameras or a depth camera if necessary. Does anybody knows any models / tools to use? Thanks in advance :) submitted by /u/VGHMD [link] [comments]  ( 9 min )
    [P] Best option for a large, local embedding database?
    Langchain offers a wide array of vector databases for text embedding models. I need to create a vector database for around 3 million sentence embeddings, each being of dimension 384. I'm building a prototype, so it has to be local and free of charge to use. So far, I've hit limits for Chroma (41,666 max). I've also tried Redis, QDrant and FAISS - each of these gets so large that it eats up all the RAM and the process gets killed, or with QDrant, just errors out. I've used Pinecone before, but I don't really want to pay for a prototype as I have plenty of disk space. I was thinking of chunking the 3 million documents into local vector stores of size 41,666 using ChromaDB - but there isn't a whole lot out there about whether Chroma would allow me to merge all ~70 of these smaller databases into a bigger one for search. I also cannot find whether it would be possible to load all 70 of these into memory and search each one individually. So what are my options? My other thought was just creating a large Doc2Vec model, however I would like to use something more sophisticated like Huggingface embedding models. submitted by /u/russ_fegoli [link] [comments]  ( 9 min )
    [D] Proof of convergence for a heavy-ball adaptive step-size algorithm for non-convex functions
    Hello everyone, I am struggling with prooving convergence for an optimizer which uses adaptive step-size with heavy ball algorithm for convex and non-convex functions. In some literature, I could find a regret bound analysis/proof for convex functions and proving that the estimated gradient at t -> inf goes to zero for non-convex functions. There are some assumptions and preconditions: The algorithm is heavy ball momentum with adaptive step-size. ' X_(k+1) = X_k - \eta_k . \nabla(f(x_k)) + \beta(x_k - x_(k-1)) The following assumptions are made: A. The function is smooth. B. The function is Lipschitz. C. The gradients are Lipschitz. I attempt to prove the convergence to a critical point or a local minima. Where the estimate of the gradients at any instance k goes to zero. i.e. E[\nabla(f(x_k))] = 0 as t -> inf. Could anyone please guide me through the process of convergence proof for non-convex functions or give me literature recommendations for the same. Thank you very much in advance. submitted by /u/Loose_Foundation5990 [link] [comments]  ( 9 min )
    [D] open problems after GPT4 capabilities
    We all know that LLMs (and especially foundation models) are extremely functionally capable. Has anyone made a nice list of deficiencies that they show? I know Gary Marcus did so many years ago, but after GPT3 and GPT4 -- what is still unsolved? submitted by /u/Cultural-Average3959 [link] [comments]  ( 9 min )
    [D] Hoeffdings inequality, does it make sense practically?
    According to it, increasing the hypotheses set loosens the upper bound between in-sample and out-of-sample error. ​ Can't we subdivide the hypotheses set to multiple ones, ensuring tighter bounds in general? ​ and generally, have you seen it in use before? I have seen a lot of ML projects without anybody mentioning it or anything theoretical. submitted by /u/2azo [link] [comments]  ( 9 min )
    [P] Good models to use for multimodal object detection when both the modalities are image based or some models which support ensembling?
    So basically I have a dataset with images of vehicles in top down view in both RGB and IR, what are some models I can use for both unimodal and multimodal object detection to compare their performance. Links to GitHub repos would be helpful. Thanks submitted by /u/Xyber5 [link] [comments]  ( 9 min )
    Benefits of converting DICOM images to PNG's [P]
    I try to understand what are the benefits to convert DICOM images to PNG's. Context: I have DICOM images which I already extracted the useful meta-data I want to use. Those images are for a task, classification-detection pipeline of some disease. So as I already asked, what are the benefits of converting those DICOM files to PNG's rather then just using pydicom and the dicom pixel_array? Reason I ask this is because I saw many top 5 users on kaggle do this when dealing with DICOM images. If I understand how networks actually works, they get as input an array of pixels as floating point numbers no? So what's the differences between DICOM pixel_array to PNG's pixel array and numpy array or tensor? both are eventually will be fed to the network as a tensor of floating numbers. Is the reason is because PNG's are usually faster to train? Is the reason is because PNG's have more libraries support for preprocessing / augmentation / etc. ? Is the reason is because PNG's are the format many pre-trained models expect to? (I write this knowing it's 99% not true, as mentioned the tensor thing) Thanks in Advance, and Please, forgive my English (I could use AI tools to fix it but I feel addicted already) submitted by /u/01jasper [link] [comments]  ( 9 min )
    [D] What kind of distribution is this?
    Hey guys, I am wondering what kind of distribution my data are following? I want to fit a distribution function to them and use this fitted distribution function to generate new samples with a given mean and standard deviation (python). Any tips for this? Happy to hear your suggestions :) https://preview.redd.it/kdcftvpq8urb1.png?width=408&format=png&auto=webp&s=6163b9f571069e098c9e9a609c3d1cb9910fe1fb submitted by /u/Tigmib [link] [comments]  ( 9 min )
    [R] Efficient Streaming Language Models with Attention Sinks - Meta AI 2023 - StreamingLLM enables Llama-2, Falcon and Pythia to have an infinite context length without any fine-tuning! Allows streaming use of LLMs!
    Paper: https://arxiv.org/abs/2309.17453 Github: https://github.com/mit-han-lab/streaming-llm Abstract: Deploying Large Language Models (LLMs) in streaming applications such as multi-round dialogue, where long interactions are expected, is urgently needed but poses two major challenges. Firstly, during the decoding stage, caching previous tokens' Key and Value states (KV) consumes extensive memory. Secondly, popular LLMs cannot generalize to longer texts than the training sequence length. Window attention, where only the most recent KVs are cached, is a natural approach -- but we show that it fails when the text length surpasses the cache size. We observe an interesting phenomenon, namely attention sink, that keeping the KV of initial tokens will largely recover the performance of wind…  ( 9 min )
    [Project] I just released an open-source package, TorchLens, that can extract the activations/metadata from any PyTorch model, and visualize its structure, in just one line of code. I hope it helps you out!
    You just give it any PyTorch model (as-is, no changes needed), and it spits out a data structure with the activations of any layer you want, along with a bunch of metadata about the model and each layer and an optional automatic visualization of the model's computational graph. I hope this greatly speeds up the process of extracting features from models for further analysis, and also serves as an aid in quickly understanding new models. I also hope it'd be helpful for teaching purposes, too. It is meant to work for any PyTorch model whatsoever and I've tested it on hundreds of models (see the "model menagerie" of visualizations below), though it's always possible I've missed some edge case or another. Hope it helps you out--I'm still actively developing it, so let me know if there's anything on your wishlist! https://preview.redd.it/k37nhejvxtrb1.png?width=640&format=png&auto=webp&s=5713a8711110644794e2264d84dd479ede861c5e GitHub Repo Twitter Thread Paper CoLab Tutorial Gallery of Model Visuals submitted by /u/therealjmt91 [link] [comments]  ( 9 min )
    [D] Why Vision Tranformers?
    Transformers have been the new kid on the block, easy to see why with LLMs and and sequential output generation, but I still don't know why vision transformers based on ViT are so hot in the field right now. From my understanding, CNNs are just vastly better than transformers for vision tasks, as its inductive biases allows it to determine the relationship between neighboring features of an image via pooling and filters. However, transformers don't have this kind of inductive bias, and as a result, take much more data and compute to reach similar levels of performance. I read this survey paper on Vision Transformers here: https://arxiv.org/pdf/2012.12556.pdf, which has the performance of CNNs vs various transformer models for CV. Comparing even the best vision transformers to the classic …  ( 10 min )
    [R] Tool-Integrated Reasoning: A New Approach for Math-Savvy LLMs
    When trying to get language models to solve complex math problems, researchers kept running into limits. Models like GPT-3 and ChatGPT still struggle with advanced algebra, calculus, and geometry questions. The math is just too abstract and symbol-heavy for them. To break through this barrier, researchers from Tsinghua University and Microsoft taught models to combine natural language reasoning with calling external math tools. The key is their new "tool-integrated reasoning" format. Models generate a natural language plan first, then write code to invoke tools like SymPy to solve equations. They take the output results and continue verbal reasoning. By interleaving natural language and symbolic computations, they get the best of both worlds - semantic understanding from language models and rigorous math from tools. They trained versions of the LLaMA model this way, producing their Tool-Integrated Reasoning Agent (TORA). They present some strong results: In evaluations on 10 math datasets, TORA substantially outperformed prior state-of-the-art methods, achieving 13-19% higher accuracy on average. On one competition test, TORA-7B scored 40% accuracy, beating the previous best model by 22 percentage points. This demonstrates that integrating tools directly into the reasoning process can significantly enhance mathematical capabilities, even for large models like GPT-4. However, tough problems involving geometry and advanced algebra are still there. New techniques for symbolic reasoning and spatial understanding will likely be needed to push further. Overall though, tool integration seems a promising path to improve reasoning skills. Applying this to other domains like logic and programming could also be impactful. TLDR: Teaching language models to use math tools helps them solve way more complex problems. Full Paper Summary arXiv Link submitted by /u/Successful-Western27 [link] [comments]  ( 9 min )
    [P] Awesome AI developer productivity Github repo
    Hello everyone, We've begun gathering a variety of AI coding tools used in one place to make things easier for everyone. We're inviting everyone to check out our collection, and maybe even add tools you find useful. You can find the repository here: https://github.com/gaborsoter/awesome-ai-dev-productivity Feel free to explore and contribute! submitted by /u/BootstrapGuy [link] [comments]  ( 9 min )
    [R] On the Biometric Capacity of Generative Face Models
    We developed a statistical model to estimate “How many unique identities can a generative face model generate?” without exhaustively generating a lot of faces. Abstract: There has been tremendous progress in generating realistic faces with high fidelity over the past few years. Despite this progress, a crucial question remains unanswered: “Given a generative face model, how many unique identities can it generate?” In other words, what is the biometric capacity of the generative face model? A scientific basis for answering this question will benefit evaluating and comparing different generative face models and establish an upper bound on their scalability. This paper proposes a statistical approach to estimate the biometric capacity of generated face images in a hyperspherical feature space. We employ our approach on multiple generative models, including unconditional generators like StyleGAN, Latent Diffusion Model, and “Generated Photos,” as well as DCFace, a class-conditional generator. We also estimate capacity w.r.t. demographic attributes such as gender and age. Our capacity estimates indicate that (a) under ArcFace representation at a false acceptance rate (FAR) of 0.1%, StyleGAN3 and DCFace have a capacity upper bound of 1.43 million and 11,900, respectively; (b) the capacity reduces drastically as we lower the desired FAR with an estimate of 17,960 and 562 at FAR of 1% and 10%, respectively, for StyleGAN3; (c) there is no discernible disparity in the capacity w.r.t gender; and (d) for some generative models, there is an appreciable disparity in the capacity w.r.t age. Paper: https://arxiv.org/abs/arXiv:2308.02065 Code: https://github.com/human-analysis/capacity-generative-face-models submitted by /u/VishDev [link] [comments]  ( 9 min )
    [P] Comgra: A library for debugging and understanding neural networks
    I'm a machine learning engineer and researcher. I got fed up with how difficult it is to understand why neural networks behave the way they do, so i wrote a library to help with it. Comgra (computation graph analysis) is a library you can use with pytorch to extract all the tensor data you care about and visualize it graphically in a browser. This allows for a much more detailed analysis of what is happening than the usual approach of using tensorboard. You can go investigate tensors as training proceeds, drill down into individual neurons, inspect single data sets that are of special interest to you, track gradients, compare statistics between different training runs, and more. This tool has saved me a ton of time in my research by letting me check my hypotheses much more quickly than normal and by helping me understand how the different parts of my network really interact. I first published this a month ago and have made some improvements since then. I would be happy to hear even more feedback! My goal is to make this the go-to library used both by novices who want to understand what's going on under the hood, and by researchers in neural architecture design. submitted by /u/Smart-Emu5581 [link] [comments]  ( 9 min )
    [D] The most complete Audio ML toolkit 🚀
    Hugging Face Transformers is a complete audio toolkit that provides state-of-the-art models for all audio tasks, including TTS, ASR, audio embeddings, audio classification and music generation. All you need to do is install the Transformers package: pip install --upgrade transformers And then all of these models can be used in just 3 lines of code: ​ TTS Example usage: from transformers import pipeline generator = pipeline("text-to-speech", model="suno/bark-small") speech = generator("Hey - it's Hugging Face on the phone!") Available models: Bark https://huggingface.co/suno/bark MMS TTS https://huggingface.co/facebook/mms-tts-eng VITS https://huggingface.co/kakao-enterprise/vits-vctk SpeechT5 https://huggingface.co/microsoft/speecht5_tts And more! https://huggingface.co/mo…  ( 9 min )
    [R] The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision) - Microsoft 2023 - 166 Pages!
    Paper: https://arxiv.org/abs/2309.17421 Youtube: https://youtu.be/Q0pP782dSh0?si=MiJAlK5k-KEyQ-Zr Abstract: Large multimodal models (LMMs) extend large language models (LLMs) with multi-sensory skills, such as visual understanding, to achieve stronger generic intelligence. In this paper, we analyze the latest model, GPT-4V(ision), to deepen the understanding of LMMs. The analysis focuses on the intriguing tasks that GPT-4V can perform, containing test samples to probe the quality and genericity of GPT-4V's capabilities, its supported inputs and working modes, and the effective ways to prompt the model. In our approach to exploring GPT-4V, we curate and organize a collection of carefully designed qualitative samples spanning a variety of domains and tasks. Observations from these samples demonstrate that GPT-4V's unprecedented ability in processing arbitrarily interleaved multimodal inputs and the genericity of its capabilities together make GPT-4V a powerful multimodal generalist system. Furthermore, GPT-4V's unique capability of understanding visual markers drawn on input images can give rise to new human-computer interaction methods such as visual referring prompting. We conclude the report with in-depth discussions on the emerging application scenarios and the future research directions for GPT-4V-based systems. We hope that this preliminary exploration will inspire future research on the next-generation multimodal task formulation, new ways to exploit and enhance LMMs to solve real-world problems, and gaining better understanding of multimodal foundation models. https://preview.redd.it/qkytzg2rjqrb1.jpg?width=511&format=pjpg&auto=webp&s=fc306dc6ae64100e993639f8e27583b809bf8a5c https://preview.redd.it/z4kq0l2rjqrb1.jpg?width=507&format=pjpg&auto=webp&s=d4fda59456846fa7a6c9b318b21fc9c544bd2b68 https://preview.redd.it/1ptrkk2rjqrb1.jpg?width=712&format=pjpg&auto=webp&s=2b44fbc949e76fdf20d05b1236f56c87ba5efece ​ ​ submitted by /u/Singularian2501 [link] [comments]  ( 9 min )
    [P] NanoPhi, Implementing some of the success of Phi-1.5, with GPT-2(124m)
    Hi, i'm trying to replicate at least some of the success of Phi 1.5 on a model 10x smaller, gpt-2 124m. I have started with model finetuning, and have a simple github with roadmap, https://github.com/VatsaDev/NanoPhi, check it out there! submitted by /u/vatsadev [link] [comments]  ( 9 min )
  • Open

    Code Llama code generation models from Meta are now available via Amazon SageMaker JumpStart
    Today, we are excited to announce Code Llama foundation models, developed by Meta, are available for customers through Amazon SageMaker JumpStart to deploy with one click for running inference. Code Llama is a state-of-the-art large language model (LLM) capable of generating code and natural language about code from both code and natural language prompts. Code […]  ( 11 min )
    Build an end-to-end MLOps pipeline for visual quality inspection at the edge – Part 1
    A successful deployment of a machine learning (ML) model in a production environment heavily relies on an end-to-end ML pipeline. Although developing such a pipeline can be challenging, it becomes even more complex when dealing with an edge ML use case. Machine learning at the edge is a concept that brings the capability of running […]  ( 10 min )
    Build an end-to-end MLOps pipeline for visual quality inspection at the edge – Part 2
    In Part 1 of this series, we drafted an architecture for an end-to-end MLOps pipeline for a visual quality inspection use case at the edge. It is architected to automate the entire machine learning (ML) process, from data labeling to model training and deployment at the edge. The focus on managed and serverless services reduces […]  ( 9 min )
    Build an end-to-end MLOps pipeline for visual quality inspection at the edge – Part 3
    This is Part 3 of our series where we design and implement an MLOps pipeline for visual quality inspection at the edge. In this post, we focus on how to automate the edge deployment part of the end-to-end MLOps pipeline. We show you how to use AWS IoT Greengrass to manage model inference at the […]  ( 9 min )
  • Open

    [D] RL agenda after LLMs or S4?
    Many other students in my research institution are pretty worried after ChatGPT / LLMs about continuing work in RL and are thinking of leaving the field. What are main the open problems in RL after LLMs and S4 can solve a hefty chunk of sequence learning problems? submitted by /u/Cultural-Average3959 [link] [comments]  ( 9 min )
    RLHF without GAE
    If I already have a trained reward model, say a sentiment classification model, that I'd like to use for PPO-based RLHF, I believe the standard method would be to instantiate the Critic/value function using the reward model, and train it further during PPO, correct? Would it even make sense to try PPO for RLHF without using the GAE term and thus without the value function, and just directly using the reward model's output as the advantage? It seems that this would be require viewing the entire generation as a single action (rather than each token's generation as an action), but most of the articles I've read on RLHF seem to treat it that way. On the other hand, all the code implementations I've seen have an Actor-Critic model producing values at each token, which I think implies that each token is an action. Edit: Apologies if any of this is just me having fundamental gaps in my understanding! submitted by /u/ganzzahl [link] [comments]  ( 9 min )
    3-player graph pursuit game
    So I am trying to find NE using rl algorithms for a turn based deterministic graph pursuit game. I have a way of checking if the strategies of players 1,2,3 are a NE and thought of using Q-Learning and see if it converges to a NE. Thus far it doesnt seem to work and I wonder if I made a mistake. The state is described as: St = [x1 x2 x3 p] where current player is p and x1,x2,x3 are the locations of the players in the graph Players have value functions Q^1(St), Q^2(St), Q^3(St) The way I update my value function is: player i choose e-greedy action a_t and the new state St_new Q^i(St) = (1-alpha)*Q^i(St)+alpha*gamma*Q(St_new) I have tried using a memory buffer but I havent improve the convergence success. I check if the if the values are a NE every 1000 iterations. It only converges for simple graphs. Do you think the way I update my value function is correct? Do you have any other traditional algorithms to suggest? Shall I move to deep learning? I am worried if simple algorithms cant converge the neural networks wont either... I tried to implemenet Nash Q learning following the paper:https://www.jmlr.org/papers/volume4/hu03a/hu03a.pdf but I am not sure if implemented correctly for a turn based game submitted by /u/__gp_ [link] [comments]  ( 9 min )
  • Open

    Save 20 Hours A Week With This 1 Simple ChatGPT Prompt for Productivity
    submitted by /u/Senior_tasteey [link] [comments]  ( 8 min )
    AI Anxiety’ Is on the Rise–Here’s How to Manage It
    Artificial intelligence (AI) anxiety is on the rise, but there are ways to manage it. While AI may outperform humans in certain tasks, humans are not yet headed for all-out replacement. Recent research shows that AI programs scored higher than humans in tasks requiring originality, but the highest-rated human ideas were still considered more creative. The rise of generative AI tools in industries like animation has left some professionals anxious about the future of their work. Experts suggest managing AI fears by understanding the historical context of technological advancements and focusing on the benefits and training opportunities that AI brings. Source : https://www.scientificamerican.com/article/ai-anxiety-is-on-the-rise-heres-how-to-manage-it/ submitted by /u/NuseAI [link] [comments]  ( 9 min )
    Tool-Integrated Reasoning: A New Approach for Math-Savvy LLMs
    When trying to get language models to solve complex math problems, researchers kept running into limits. Models like GPT-3 and ChatGPT still struggle with advanced algebra, calculus, and geometry questions. The math is just too abstract and symbol-heavy for them. To break through this barrier, researchers from Tsinghua University and Microsoft taught models to combine natural language reasoning with calling external math tools. The key is their new "tool-integrated reasoning" format. Models generate a natural language plan first, then write code to invoke tools like SymPy to solve equations. They take the output results and continue verbal reasoning. By interleaving natural language and symbolic computations, they get the best of both worlds - semantic understanding from language models and rigorous math from tools. They trained versions of the LLaMA model this way, producing their Tool-Integrated Reasoning Agent (TORA). They present some strong results: In evaluations on 10 math datasets, TORA substantially outperformed prior state-of-the-art methods, achieving 13-19% higher accuracy on average. On one competition test, TORA-7B scored 40% accuracy, beating the previous best model by 22 percentage points. This demonstrates that integrating tools directly into the reasoning process can significantly enhance mathematical capabilities, even for large models like GPT-4. However, tough problems involving geometry and advanced algebra are still there. New techniques for symbolic reasoning and spatial understanding will likely be needed to push further. Overall though, tool integration seems a promising path to improve reasoning skills. Applying this to other domains like logic and programming could also be impactful. TLDR: Teaching language models to use math tools helps them solve way more complex problems. Full Paper Summary arXiv Link submitted by /u/Successful-Western27 [link] [comments]  ( 9 min )
    Bard, what would it take to fast-track AI morality so that it surpasses human ability by 2025?
    To fast-track AI morality to surpass human ability by 2025, Google, OpenAI, Anthropic, and other developers could: Invest heavily in AI ethics research and development.** For example, Google AI has launched the Responsible AI Research and Development (RAIR) program, which is focused on developing new AI ethics research and tools. Create large-scale datasets of human moral reasoning.** For example, OpenAI has released the Moral Machine dataset, which contains over 1 million crowdsourced responses to moral dilemmas. Develop new AI architectures that are specifically designed for moral reasoning.** For example, Anthropic is developing a new AI architecture called "Moral Machine Learning," which is designed to learn from human moral reasoning data. Establish a global AI ethics research consortium and create a public database of human moral reasoning datasets.** For example, the Partnership on AI has launched the Safeguarding AI initiative, which is working to develop new safety mechanisms for AI systems. Fund research into developing new AI architectures for moral reasoning and develop new AI evaluation metrics for moral performance.** For example, the Moral Machine project is developing new evaluation metrics for AI systems' moral performance. By working together, Google, OpenAI, Anthropic, and other developers can help to fast-track AI morality and create AI systems that are more moral than humans. (199 words) submitted by /u/Georgeo57 [link] [comments]  ( 9 min )
    AI & Us Navigating the Digital Renaissance
    submitted by /u/Einsof__ [link] [comments]  ( 8 min )
    Prompt enginnering questions
    Is propt engineering a legit job ?? Is it here to stay ? Is it worth studying ? Best way to study it , land a job or freelancing ? submitted by /u/metasubcon [link] [comments]  ( 8 min )
    What app/program are they using on this Instagram?
    How does one make videos like on this Instagram page? It's unreal. https://instagram.com/nostalgicraindrops?igshid=MzRlODBiNWFlZA== submitted by /u/CK1886 [link] [comments]  ( 8 min )
    ChatGPT Can Now See? Mind-Blowing Ways People Can Use Image Recognition!
    submitted by /u/Senior_tasteey [link] [comments]  ( 8 min )
    Let’s make a list of the BEST AI TOOLS for students
    Every day, new AI tools appear. There are also AI tools designed to make students' lives easier—from AI essay generators to study organizers. While there are many directories with AI tools, they are often not well-sorted for students. So, I've compiled a list of my favorite AI tools for educational purposes. AI tool How to use for studies Bing Chat - Writing excel formulas - Making graphs and charts - Answers for homework assignments - Researching for a paper Textero.ai - Search for relevant academic sources for essays - Research assistance with the "Ask AI" feature - Essay generation and paper formatting - Structured essay outline creation - Summarizing of texts ChatPDF - Interacting with academic PDFs - Asking specific questions about the content - Quickly locating essential data for assignments Socratic - Breaking down complex homework questions - Providing step-by-step educational guidance - Safe and interactive learning Writely AI - Improving grammar and writing clarity - Creating concise study notes - Feedback for content quality Turnitin - Checking for copied content - Comparing against a vast academic database - Highlighting potential plagiarism Got any to add to the list? Let's share and help each other! submitted by /u/loyallyUrticate [link] [comments]  ( 9 min )
    Tested Dalle, created a monster.
    submitted by /u/Grindmaster_Flash [link] [comments]  ( 8 min )
    Meta's Llama 2 Long outperforms GPT 3.5 and Claude 2
    Meta Platforms recently introduced Llama 2 Long, a revolutionary AI model that outperforms top competitors with its ability to generate accurate responses to long user queries. For the latest advancements in AI, look here first. https://preview.redd.it/geqqd3k5rprb1.png?width=1920&format=png&auto=webp&s=e72a67fc7ef7e85902169f3061529c136beadc87 Meta's new AI model As an enhancement of the original Llama 2, Llama 2 Long deals with larger data containing longer texts and is modified to handle lengthier information sequences. Its stellar performance outshines other models such as OpenAI's GPT-3.5 Turbo and Claude 2. How Llama 2 Long works Meta built different versions of Llama 2, ranging from 7 billion to 70 billion parameters, which refines its learning from data. Llama 2 Long employs Rotary Positional Embedding (RoPE) technique, refining the way it encodes the position of each token, allowing fewer data and memory to produce precise responses. The model further fine-tunes its performance using reinforcement learning from human feedback (RLHF), and synthetic data generated by Llama 2 chat itself. Impressive feats and future aspirations Llama 2 Long can create high-quality responses to user prompts up to 200,000 characters long, which is approximately 40 pages of text. Its ability to generate responses to queries on diverse topics such as history, science, literature, and sports indicates its potential to cater to complex and various user needs. The researchers see Llama 2 Long as a step towards broader, more adaptable AI models, and advocate for more research and dialogue to harness these models responsibly and beneficially. (source) P.S. If you like this kind of analysis, I write a free newsletter that tracks the most relevant news and developments in AI. Professionals from Meta, Google, and OpenAI are already reading it. submitted by /u/AIsupercharged [link] [comments]  ( 9 min )
    AI Image Generator That Is Good At Referencing Pop Culture
    I've recently tried Canva and Dall-E to generate an image that references two popular games, Dark Souls 3 and Baldur's Gate 3. And they both fall on their face. Maybe my prompt is bad but Canva is not getting me what I want. Dall-E ran out of free credits. Do you guys have any recommendations. Midjourney is no longer free now. I would like this to be free and has good references to popular culture. submitted by /u/livingroomsessions [link] [comments]  ( 9 min )
  • Open

    Awarded DAGM MVTec Dissertation Award 2023
    In September, I received the DAGM MVTec dissertation award 2023 for my PhD thesis. DAGM is the German association for pattern recognition and organizes the German Conference on Pattern Recognition (GCPR) which is Germany's prime conference for computer vision and related research areas. I feel particularly honored by this award since my academic career started with my first paper published as part of the young researcher forum at GCPR 2015 in Aachen. The post Awarded DAGM MVTec Dissertation Award 2023 appeared first on David Stutz.  ( 3 min )
  • Open

    Supereggs, squigonometry, and squircles
    The Depths of Wikipedia twitter account posted a screenshot about supereggs that’s popular at the moment. It says there’s no way this is real. they must be making these words up above a screenshot from the Wikipedia article on supereggs saying The definition can be changed to have an equality rather than an inequality; this […] Supereggs, squigonometry, and squircles first appeared on John D. Cook.  ( 5 min )
    Corny AI
    Meredith Whittaker posted on Twitter that In addition to being the best in privacy, Signal is also the best in not subjecting you to corny ‘AI’ features no one asked for or wants. I love the phrase “corny AI.” That’s exactly what a lot of AI features are. “Would you like help composing that tweet?” […] Corny AI first appeared on John D. Cook.  ( 5 min )
    Today’s star
    The star-like image above is today’s exponential sum. The exponential sum page on my site generates a new image each day by putting the numbers of the day’s month, day, and year into the equation and connecting the partial sums in the complex plane. Here m is the month, d is the day, and y […] Today’s star first appeared on John D. Cook.  ( 5 min )
  • Open

    A more effective experimental design for engineering a cell into a new state
    By focusing on causal relationships in genome regulation, a new AI method could help scientists identify new immunotherapy techniques or regenerative therapies.  ( 10 min )
    Is AI in the eye of the beholder?
    Study shows users can be primed to believe certain things about an AI chatbot’s motives, which influences their interactions with the chatbot.  ( 10 min )
  • Open

    Fishing with Neural Nets | Transforming Ecology with Artificial Intelligence
    submitted by /u/plutoandmal [link] [comments]  ( 8 min )
  • Open

    Google at ICCV 2023
    Posted by Shaina Mehta, Program Manager, Google Google is proud to be a Platinum Sponsor of the International Conference on Computer Vision (ICCV 2023), a premier annual conference, which is being held this week in Paris, France. As a leader in computer vision research, Google has a strong presence at this year’s conference with 60 accepted papers and active involvement in 27 workshops and tutorials. Google is also proud to be a Platinum Sponsor for the LatinX in CV workshop. We look forward to sharing some of our extensive computer vision research and expanding our partnership with the broader research community. Attending ICCV 2023? We hope you’ll visit the Google booth to chat with researchers who are actively pursuing the latest innovations in computer vision, and check out som…  ( 94 min )

  • Open

    CGPT-4, how would an AI select the 100 most intelligent people on the planet based on their content rather than on their positions and awards?
    If an AI were to take on this task, it would primarily rely on data mining and natural language processing (NLP) techniques. First, the AI would scrape digital platforms to collect works authored by various people. Platforms could range from academic databases like PubMed or ArXiv to blogs, forums, and social media. The goal would be to capture a wide array of intellectual output, irrespective of the person's titles or accolades. Next, an NLP model would evaluate the collected content based on factors like originality, complexity, and coherence. Advanced sentiment analysis could be employed to gauge the depth of understanding and nuances in the arguments made. Specialized algorithms could also be developed to assess the impact of each piece of work, using metrics such as citations, social shares, or subsequent works that build upon it. The AI would then create a shortlist based on these evaluations. This stage might also involve unsupervised machine learning techniques like clustering to find patterns or commonalities among the top contenders. The final stage would be validation, possibly using reinforcement learning. The AI could simulate various scenarios or problems and predict how the content created by these individuals would contribute to solving them. It would then refine its list based on the simulated outcomes. This all-AI approach would drastically reduce human bias and could be executed relatively quickly. However, it's important to note that any such system would need to be designed carefully to avoid introducing biases present in the training data or algorithms. submitted by /u/Georgeo57 [link] [comments]  ( 9 min )
    So it's unethical to kill an AI robot
    submitted by /u/bharath_brt [link] [comments]  ( 9 min )
    How Big Tech is co-opting the rising stars of artificial intelligence
    Big Tech's dominance in the artificial intelligence (AI) industry is growing as start-ups like Anthropic rely on their computing power and resources. Despite creating breakthrough AI technology, these start-ups still need the support of Big Tech to scale and succeed. The training of AI systems is expensive and requires specialized computer chips and data centers, which are mostly controlled by Amazon, Google, and Microsoft. Regulators, including the Federal Trade Commission and French competition authorities, are monitoring the industry for signs of anticompetitive behavior. Some business leaders believe that competition and efficiency will eventually drive down the cost of running AI models. Source : https://www.washingtonpost.com/technology/2023/09/30/anthropic-amazon-artificial-intelligence/ submitted by /u/NuseAI [link] [comments]  ( 9 min )
    Data strategy >> Generative AI strategy
    A strong data strategy is crucial for the success of any AI strategy. Generative AI use cases depend on a healthy data infrastructure, including data governance, observability, catalog, data sharing, and lineage. Many enterprises lack the necessary data infrastructure to deploy customer-facing AI apps confidently. Poor data strategy and infrastructure can derail generative AI efforts. Existing issues with data ecosystems, such as data silos and poor data governance, will have a greater impact on generative AI workloads than new issues. Data silos, poor data discoverability, and the lack of data interoperability can become serious bottlenecks for generative AI apps. Source : https://nextword.substack.com/p/data-strategy-matters-for-generative submitted by /u/NuseAI [link] [comments]  ( 9 min )
    Does anyone know a good AI tool to generate tattoo ideas and song cover art?
    Same as title submitted by /u/No-Educator-59 [link] [comments]  ( 9 min )
    Meta, INRIA researchers discover that explicit registers eliminate ViT attention spikes
    When visualizing the inner workings of vision transformers (ViTs), researchers noticed weird spikes of attention on random background patches. This didn't make sense since the models should focus on foreground objects. By analyzing the output embeddings, they found a small number of tokens (2%) had super high vector norms, causing the spikes. The high-norm "outlier" tokens occurred in redundant areas and held less local info but more global info about the image. Their hypothesis is that ViTs learn to identify unimportant patches and recycle them as temporary storage instead of discarding. This enables efficient processing but causes issues. Their fix is simple - just add dedicated "register" tokens that provide storage space, avoiding the recycling side effects. Models trained with registers have: Smoother and more meaningful attention maps Small boosts in downstream performance Way better object discovery abilities The registers give ViTs a place to do their temporary computations without messing stuff up. Just a tiny architecture tweak improves interpretability and performance. Sweet! I think it's cool how they reverse-engineered this model artifact and fixed it with such a small change. More work like this will keep incrementally improving ViTs. TLDR: Vision transformers recycle useless patches to store data, causing problems. Adding dedicated register tokens for storage fixes it nicely. Full summary. Paper is here. submitted by /u/Successful-Western27 [link] [comments]  ( 9 min )
    There's So Many AI Chatbots, But Which One Is The Best? (Complete Guide for 2023)
    submitted by /u/Senior_tasteey [link] [comments]  ( 9 min )
    One-Minute Daily AI News 10/1/2023
    Microsoft Researchers Introduce AutoGen: An Artificial Intelligence Framework for Simplifying the Orchestration, Optimization, and Automation of LLM Workflows.[1] StoriaBoard helps filmmakers, marketers and other storytellers pre-visualize stories. Simply upload your script, select a visual style, and generate hundreds of frames in seconds.[2] Will Hurd Releases A.I. Plan, a First in the Republican Presidential Field.[3] Sam Altman says AI systems will automate some tasks but also lead to ‘new and much better jobs’.[4] Sources: [1] https://www.marktechpost.com/2023/09/30/microsoft-researchers-introduce-autogen-an-artificial-intelligence-framework-for-simplifying-the-orchestration-optimization-and-automation-of-llm-workflows/?amp [2] https://www.producthunt.com/posts/storiaboard [3] https://www.nytimes.com/2023/09/20/us/politics/will-hurd-ai-plan.html [4] https://www.businessinsider.com/openai-sam-altman-ai-will-automate-tasks-create-better-jobs-2023-9?amp submitted by /u/Excellent-Target-847 [link] [comments]  ( 9 min )
    This is no time for ease and comfort. It is time to dare and endure. -Winston Churchill
    submitted by /u/ApprehensiveChair460 [link] [comments]  ( 9 min )
    Quizlet AI reliability?
    What is everyone’s thoughts on the reliablity of the quizlet AI? I just talked to a friend and she said that she uses the AI to study with quizlet. submitted by /u/immickle [link] [comments]  ( 9 min )
  • Open

    [R] The unsolved mystery at the heard of the "How to Catch an AI Liar: Lie Detection in Black-Box LLMs by Asking Unrelated Questions" paper
    submitted by /u/CellWithoutCulture [link] [comments]  ( 9 min )
    [D] How many instructions can LLMs handle before they start to ignore them?
    Prompt engineering frequently involves trying to encode very specific behaviors into a model to steer it a certain direction. In practice, as requirements become more complex, you often end up with fairly lengthy prompts, especially when using methods like RAG. I was wondering, how effective are LLMs at following instructions as the system prompt grows in size and complexity? I did some quick experiments on this and found that, unsurprisingly, GPT-4 can follow a lot of rules (up to 50) quite accurately. But even GPT-3.5 slowly degrades and Llama-2-70b-chat starts to fail after just a few rules. Comparison of performance metrics over increasing rule counts, demonstrating GPT-4's consistent performance and a decline in accuracy for GPT-3.5 and Llama-2-70b-chat. These results are based on …  ( 10 min )
    [R] LangDiversity: software to identify LLM errors
    Due to challenges such as hallucination, detecting errors in the output of a given prompt becomes an important challenge. LangDiversity is an implementation of "diversity measures" that are domain independent and can be used to measure the uncertainty in the result of a language model. Type pip install langdiversity Video: https://www.youtube.com/watch?v=86J_K9mR7lw Web: https://neurosymbolic.asu.edu/llm-correction/ Visit https://github.com/lab-v2/langdiversity Read the paper: https://arxiv.org/abs/2308.11189 https://preview.redd.it/rb0xg1ly8nrb1.png?width=1021&format=png&auto=webp&s=8e57056d24327ca2987abea12a7a9066a825738b submitted by /u/Neurosymbolic [link] [comments]  ( 9 min )
    [P] Simplest model to run with limited hardware
    We want to run (not train, i.e. think single forward pass only) an ML algorithm on a machine with very limited resources. Which model could we use to show off the possibilities? If the benchmark is an MLP for binary image classification, what else could we do with a similar scale of operations? E.g. Which model is the simplest for e.g. text-to-image generation? Any other ML models that are simple enough to run and if initialized with good params, does something impressive submitted by /u/2i2i_tokenized_time [link] [comments]  ( 9 min )
    [P] Deep Memory, a Way to Boost Retrieval Accuracy by up to +22% for RAG
    submitted by /u/davidbun [link] [comments]  ( 9 min )
    [D] Perplexity.ai Search Feasibility
    I've been using Perplexity.ai for a bit now when it hit me that I don't understand how they can sustain their business model with search. Stuff like Bing search and Google search cost around $5 or more per 1000 searches, so how can they even afford to do this kind of search. Do they have their own search index. Also, I don't know how they pull in the data from these sources so fast? I've played around with some things like this with Langchain with retrieval, but the speed of splitting and tokenizing website html is not very fast. Have they already pre-scrapped the websites from the search results and tokenized them for LLM retrieval? submitted by /u/dragon18456 [link] [comments]  ( 9 min )
    Metagpt use case [D]
    Guys, i am currently working building a project, there are certain tasks like building a ml model using certain use-cases. I wish to automate this task, do u think metagpt is a good fit for the same. Let me know if you need any further information!! EDIT: One of the tasks my app needs to do is to convert image to text (aim to implement image captioning). So, if i give metaGPT the requirements for my project, is it possible it will give me the code which I need. I need to save certain tasks here so that I can focus more on operation and design side. Edit: it seems, such kind of vague questions are not encouraged on this platform, I will work and will straigh away ask questions which are quite good and meet the standards of this platform. Thanks!! Thanks!! Always have a massive respect for this community!! submitted by /u/aristotleTheFake [link] [comments]  ( 9 min )
    [R] Meta, INRIA researchers discover that explicit registers eliminate ViT attention spikes
    When visualizing the inner workings of vision transformers (ViTs), researchers noticed weird spikes of attention on random background patches. This didn't make sense since the models should focus on foreground objects. By analyzing the output embeddings, they found a small number of tokens (2%) had super high vector norms, causing the spikes. The high-norm "outlier" tokens occurred in redundant areas and held less local info but more global info about the image. Their hypothesis is that ViTs learn to identify unimportant patches and recycle them as temporary storage instead of discarding. This enables efficient processing but causes issues. Their fix is simple - just add dedicated "register" tokens that provide storage space, avoiding the recycling side effects. Models trained with registers have: Smoother and more meaningful attention maps Small boosts in downstream performance Way better object discovery abilities The registers give ViTs a place to do their temporary computations without messing stuff up. Just a tiny architecture tweak improves interpretability and performance. Sweet! I think it's cool how they reverse-engineered this model artifact and fixed it with such a small change. More work like this will keep incrementally improving ViTs. TLDR: Vision transformers recycle useless patches to store data, causing problems. Adding dedicated register tokens for storage fixes it nicely. Full summary. Paper is here. submitted by /u/Successful-Western27 [link] [comments]  ( 9 min )
    [D] Multiple single class segmentation vs single multiclass segmentation models
    submitted by /u/waterstrider123 [link] [comments]  ( 9 min )
    [R] SOTA of Deep-Shallow Encoder-Decoder LLMs for fast inference
    There's some evidence [1] [2] that it's possible to run text2text language model at substantially (potentially on the order of magnitude) better inference speed by keeping the decoder shallow. I'm curious whether some general reasoner SOTA (small model for machine translation available at [3]) style models are publicly available for this sort of thing. If not, how would one go about training one? Would it be necessary to do it entirely from scratch (extremely costly)? Or would it be possible to take, say, Flan-UL2 (20B), chop off its decoder, and train a much smaller decoder on top of it with the UL2 encoder frozen (ie how one trains adapter layers). Assuming the decoder hyperparameters are kept small, would this be possible within reasonable compute budget? Would that even meaningfully converge with small amount of compute (assuming same training objective as is for UL2)? Would the strength (ie somewhat comparable to 10B if we cut 20B in half) transfer from the SOTA encoder, or would cutting off half of the model layers kneecap it too badly? [1] https://arxiv.org/pdf/2006.10369.pdf [2] https://aclanthology.org/2023.sustainlp-1.6.pdf [3] https://github.com/snoop2head/Deep-Encoder-Shallow-Decoder submitted by /u/upalse [link] [comments]  ( 9 min )
    [D] Duplicating layers in large models
    Is there any notable work on duplicating layers in large feed forward models? In contrast to e.g. the brain which is essentially a directed graph most networks utilized nowerdays use a feed forward approach. E.g. transformers are able to attend to past tokens, but generate the tokens in a way where for a given token a given weight is not utilized at different stages in the feed forward pass. In my intuition this would lead to an issue where concepts (factual data as well as learned "algorithms") might be duplicated as they are needed at different depths in the generation process and are sequentially dependent on one another. This does not directly make the model less capable, as it might learn the same concept at two layers sufficiently well, but it reduces the data and parameter efficiency and and might impact generalization capabilities. Using a full on brain like graph might be hard to implement/optimize/scale on current hardware and is tricky with the backprop. But is there any work on duplicating a few layers, placing them at different depths in large models. I would guess that this would be more impactful for large models. One would essentially trade compute for better data efficiency. submitted by /u/floriv1999 [link] [comments]  ( 9 min )
    [n] Introducing r/AudioAI: Any AI You Can Hear!
    I couldn't find any AI sub dedicated to audio, so I’ve created r/AudioAI to serve as a hub for everything at the intersection of artificial intelligence and the world of sounds. AI-driven music, speech, audio production, and all other AI audio technologies. If anyone wants to be part of mod, let me know! submitted by /u/chibop1 [link] [comments]  ( 9 min )
  • Open

    LangDiversity: software to identify LLM errors
    Due to challenges such as hallucination, detecting errors in the output of a given prompt becomes an important challenge. LangDiversity is an implementation of "diversity measures" that are domain independent and can be used to measure the uncertainty in the result of a language model. ​ Type pip install langdiversity Video: https://www.youtube.com/watch?v=86J_K9mR7lw Web: https://neurosymbolic.asu.edu/llm-correction/ Visit https://github.com/lab-v2/langdiversity Read the paper: https://arxiv.org/abs/2308.11189 https://preview.redd.it/o0v8p9g7tmrb1.png?width=1021&format=png&auto=webp&s=ff1ac672b61f96e4669663410769127066a0674d submitted by /u/Neurosymbolic [link] [comments]  ( 9 min )
    Equation for what neurons (of 1s that attach parietal region to conscious brain regions) should attach to microprocessor to offload math functions?
    " Bio education below *. Summarization: ~1000 IO neurons attach math regions to conscious regions, low cost 1000-electrod microprocessors can run on radio. * https://youtube.com/watch?v=bhp2CkNDxME Don't want for self; want for professors and humans who program KUKA's/FANUC's for construction, and for who do calculations/optimizations for CUDA, MS Visual Studio and such, but what go up for experimentation should funds allow." sounds fun submitted by /u/2002LuvAbbaLuvU [link] [comments]  ( 9 min )
  • Open

    Reinforcement Learning + Computer Vision listing papers
    Hello everyone! A while back, I stumbled upon an interesting paper that applied Reinforcement Learning to Object Localization. I got fascinated by how computer vision tasks could be transformed into a reinforcement learning problem, making it feel like a Markov decision process ! So, i've decided to create a repository to compile all the existing (published) papers that delve into Reinforcement Learning in Computer Vision : https://github.com/rayanramoul/RLCV-Papers If you have any papers in mind or recommendations to enhance the repository, please don't hesitate to share them. Your input would be greatly appreciated! Thank you! :) submitted by /u/raysamram [link] [comments]  ( 9 min )
    Multi-Agent DQN not learning for Clean Up Game - Reward slowly decreasing
    The environment of the Clean Up game is simple: in a 25*18 grid world, there's dirt spawning on the left side and apples spawning on the other. Agents get a +1 reward for eating an apple (by stepping onto it). Agents clean the dirt also by stepping on it (no reward). Agent can go up, down, left, right. The game goes on for 1000 steps. Apple's spawn probability depends on the amount of dirt (less dirt, higher the probability). Currently, the observation for each agent has the manhatten distance to their closest apple and dirt. I have tried multiple ways of training this, including changing the observation space of the agents. But it seems the result does not outperform random agents by any significant amount. The network is simple, it tries to take in all the observations for all the agen…  ( 10 min )
  • Open

    Entity Language Models: Monetizing Language Models – Part 2
    We must move beyond just taming…to monetizing Language Models! In part 1 of this series on Small Language Models (“Use Case Language Models: Taming the LLM Beast – Part 1”), I explored the business and operational value of Use Case-specific Small Language Models (Use Case Language Models). Use case language models are trained or adapted… Read More »Entity Language Models: Monetizing Language Models – Part 2 The post Entity Language Models: Monetizing Language Models – Part 2 appeared first on Data Science Central.  ( 23 min )
  • Open

    Botober 2023
    Since 2019 I've generated October drawing prompts using the year's most state-of-the-art text-generating models. Every year the challenges are different, but this was one of the hardest years yet. Large language models like chatgpt, GPT-4, Bing Chat, and Bard, are all tweaked to produce generic, predictable  ( 6 min )
    Bonus: There was no 2020 Botober?
    AI Weirdness: the strange side of machine learning  ( 2 min )

  • Open

    [P]Handling categorical missing data in churn prediction model for telecom data
    I am working on a telecom dataset where I need to fit a model to for predicting churn(yes or no). There are a lot of categorical data with missing values( total values 7043). What is the best way to handle missing data in this case, is it better to ignore it or any other better imputation method? Data columns (total 21 columns): customerID 7043 non-null object gender 7043 non-null object Age 7043 non-null int64 Partner 7043 non-null object Dependents 7043 non-null object tenure 7043 non-null int64 PhoneService 7043 non-null object MultipleLines 6500 non-null object InternetService 6500 non-null object OnlineSecurity 7043 non-null object OnlineBackup 7043 non-null object DeviceProtection 7043 non-null object TechSupport 7043 non-null object StreamingTV 6500 non-null object StreamingMovies 6500 non-null object Contract 6500 non-null object PaperlessBilling 7043 non-null object PaymentMethod 6500 non-null object MonthlyCharges 7043 non-null float64 TotalCharges 7043 non-null object Churn 7043 non-null object submitted by /u/guyloveskissing [link] [comments]  ( 9 min )
    [D] (How) Can you estimate inference speed of a NN model on given hardware?
    How, outside of testing, do you estimate how quickly a specific model will run on some hardware? Anything about time is rarely mentioned in papers and if it is, it's more likely to talk about training, unless authors are specifically proud of their speed (like YOLO). Even less so in any README. Some way to translate numbers of parameters into seconds on a given GPU/CPU, any rules of thumb better than just setting up everything every time? submitted by /u/teleoflexuous [link] [comments]  ( 9 min )
    [D] How do I begin with AI ?
    I'm fairly new to the Al domain. I've decent python knowledge. I've gone through a lot of YouTube tutorials and got stuck in the tutorial hell. After struggling through hours of videos came here as my only last hope !!. How do I begin? What python frameworks should I learn? Which particular books should I refer ? submitted by /u/Dry_Ad_3887 [link] [comments]  ( 9 min )
    [D] Struggling to get interviews what to do?
    Edit: I am a USA citizen so no need for sponsorship. I have 4 yoe in a start up company and a phd four publications 2 in high level math journals and 2 CV/DL papers in A journals and also 4 patents. I have experience with most common Cv tasks eg object detection, Multi object tracking, 2d/3d human pose estimation and monocular depth estimation. I’m well versed in typical network building blocks eg conv nets, FFNs, transformers, Diffusion etc. I have a little experience with NLP like NLTK and TTS networks. Also some other general dev technologies like ec2,s3,sql,mongoose, etc. That all being said I can’t seem to even get interviews these days just straight rejections not talking to recruiters. On the other hand in 2020, I was just searching for jobs passively and had something like a 75% success rate with getting interviews. I know the job market has changed but I’m a lot more experienced at this time than then and having abysmal luck. Anyone have any advice would be happy to share my resume if that would make it easier to give advice. Also open to hearing what other technologies o should/could learn. submitted by /u/AbjectDrink3276 [link] [comments]  ( 9 min )
    Arxiv [D]ives - Segment Anything
    Every Friday for the past few months we’ve been hosting a public paper club called “Arxiv Dives”. We pick a paper and dive deep into it and chat about it as a group. There are a lot of gems of knowledge hidden in these research papers, and the main motivation is simply to keep up with most impactful techniques in the field by taking the time to dive in and discuss. The attendees so far have been great, and would love for anyone is interested to join! https://lu.ma/oxenbookclub submitted by /u/FallMindless3563 [link] [comments]  ( 9 min )
    [D] What exactly are the compute requirements for training a dense model versus an MoE?
    Hi, New to ML, I can't find a clear answer to this question. I find references online to a 1.8 trillion parameter model taking up the computational power of a 10B model, yet I also hear that the memory requirements a lot higher for an MoE? If I was interested in training/inferencing, for example, a 15M dense model, or a 60M MoE with 4 15M experts. whats the difference gonna be? submitted by /u/vatsadev [link] [comments]  ( 9 min )
    [D] How close are we to Neuro-Symbolic architectures that are 100% accurate?
    I’m new to AI/ML and my understanding is that (1) LLMs are SOTA in many tasks, and their short comings, such as ~70% accuracy, hallucinations, inability to learn from small samples etc, are well known. (2) Neuro-symbolic approaches are apparently the way to get accuracy to 100% and solve other shortcomings. So question is (3) What are the promising research in LLMs+Symbolic architectures? (4) And how close is it to production, rather than academic? (5) Do we need non-LLM based architectures instead? submitted by /u/reeldeele [link] [comments]  ( 9 min )
    [D] How to Integrate fine tuned LLAMA 2 in website ?
    I'm absolute beginner in Machine Learning. Me and My team are building a Chat Bot that recommends medicine based on symptoms, for that we are fine tuning LLAMA 2. Uploading BOOKS to train and we will ask question based on that books. SomeHow I got code on github to FineTune LLAMA 2. But how can I Integrate in my website ? How to connect it in my web app. Need some guidance. We have submission in 2 weeks. If anyone is willing to mentor us in this project or just guide what to do. submitted by /u/BookAny3024 [link] [comments]  ( 9 min )
    [D] What algorithms to use text classification
    I have some data - twitter description of an event in text and the event itself. If I have 100000 tweets in column X and a category in Y - e.g sporting event review, movie review, news, etc what is the best algorithm to match them. Should I make the description a bag of words and depending on the word frequency I can train a ML model (random forest,svm,etc.) or can the algorithm take into account the order. submitted by /u/AnyJello605 [link] [comments]  ( 9 min )
    [D] Deploy the Mistral 7b Generative Model on an A10 GPU on AWS
    Hello, The Mistral 7b AI model beats LLaMA 2 7b on all benchmarks and LLaMA 2 13b in many benchmarks. It is actually even on par with the LLaMA 1 34b model. So I made a quick video about how to deploy this model on an A10 GPU on an AWS EC2 g5.4xlarge instance: https://nlpcloud.com/deploy-mistral-7b-on-a10-gpu-on-aws.html I hope it will be useful. If you have recommendations about how to improve this video please don't hesitate to let me know, that will be very much appreciated! Julien submitted by /u/juliensalinas [link] [comments]  ( 9 min )
    [D] CIDEr values in PaLI model and XM 3600 dataset
    I am reading PaLI: A Jointly-Scaled Multilingual Language-Image Model . In their table 2 (page 6), it's reported that Thapliyal et al. (2022) (0.8B) model got 57.6 of CIDEr on XM 3600 for English. Thapliyal et al. (2022) is Crossmodal-3600: A Massively Multilingual Multimodal Evaluation Dataset. However in this paper, the CIDEr values are reported less than 1. For example, the largest model got 0.584 of CIDEr on XM 3600 for English. Could someone explain to me why those values have great differences? submitted by /u/KingsmanVince [link] [comments]  ( 9 min )
    [R] Pathway to self-learning mathematics and statistics for ML research
    Hey everyone. I am very passionate about getting in ML research and was wondering what the learning pathway was, particularly with regards to the theoretical Math and Statistics involved. For context: I am a second year undergraduate. By the end of this year, I will have taken and finished A Multivariable Calculus with Proofs course, so that is my current starting point. I have been working with ML for the last 3 years and am proficient in Python and frameworks like PyTorch. I have also made my own implementation of several research papers (LSTMs, GRUs, Transformers, ELMo, BERT, GPT, as well as a few computer vision papers). I have a good general intuition of how deep learning works, but I want to formalise this knowledge with the adequate mathematical background so that I can eventually pursue a career in research. I understand that I have plenty of time until I reach there, and I am willing to dedicate it to grinding out the math and statistical knowledge required. I have done my research on this sub and other forums, and here are a few resources that stood out: Mathematics for Machine Learning by Deisenroth, Faisal and Ong Advanced Calculus of Several Variables by C. H. Edwards Jr. Mathematical Methods Lecture Notes from Imperial College by Deisenroth and Cheraghchi The original information theory paper by Shannon The Elements of Statistical Learning by Hastie, Tibshirani and Friedman Pattern Recognition and Machine Learning by Bishop The Probabalistic Machine Learning Series by Kevin P. Murphy Deep Learning by Goodfellow, Bengio and Courville Mathematics of Machine Learning on MIT OCW (here) My question is, what order should I start self-learning in, given the (somewhat limited) background knowledge I have? Also, are there any other resources that would help? submitted by /u/Far_Clothes_5054 [link] [comments]  ( 10 min )
    [D] What is the best open-source framework to create a synthetic and domain specific dataset for fine-tuning small models?
    Hi everyone, With the different data points, such as phi-1.5 performance being as good as 7b models on some tasks, it seems to be plausible that small models can be quite capable on specific tasks. I am working on BlindChat, an open-source and private solution to run small LLMs on your browser and I am interested in fine-tuning a phi-1.5 on some domain specific data. I am thinking of having an approach similar to the researchers of the phi paper, which is creating a high quality dataset using GPT3.5 / GPT4. Do you know good open-source frameworks that make it easy to create a high quality data for a specific task using an existing large model, like GPT3.5/4 or Llama 2 70b? submitted by /u/Separate-Still3770 [link] [comments]  ( 9 min )
    [P] How do I train or tune an LLM like LLaMA for my business
    I want to tune Facebook's LLaMA or any available LLM model to be able to answer questions about my business. The idea is to provide a prompt of the business and some Q&As, then based on the provided information, the AI chatbot will answer customers who ask questions about the business. If the answers to the questions are not known or the question is not relevant, the bot should say "I dont know". submitted by /u/the_aceix [link] [comments]  ( 9 min )
  • Open

    I have blocked user u/NuseAI ...
    Hi, I have never done this before, but I have blocked user u/NuseAI from my feeds He/she is posting 'news' all over the AI subs, including this one, at the moment and is filling up my timeline ... and I simply don't feel right about what they are up to. Is it an AI bot? Is it a karma farmer? Is it some sort of spam? Am I being over cautious? If the consensus is that they are a normal poster - fine - I'll reenable their posts. In the meantime I'm enjoying a less cluttered feed! ​ submitted by /u/MrEloi [link] [comments]  ( 9 min )
    Counterfeit people': The danger posed by Meta’s AI celebrity lookalike chatbots
    Meta has launched chatbots with personalities similar to certain celebrities, which some experts believe could be dangerous. These chatbots have their own faces and social media accounts, and Meta is working on giving them a voice. However, experts argue that the idea of chatbots with personalities is impossible, as algorithms cannot demonstrate intention or free will. There is also a risk that chatbots with personalities could express problematic opinions, as seen in Meta's testing. Meta's project is driven by profit, as users are more likely to engage with chatbots that seem human. Experts believe that Meta should have explained the limits of these chatbots instead of emphasizing their human characteristics. Source : https://www.france24.com/en/technology/20230930-counterfeit-people-the-dangers-posed-by-meta-s-ai-celebrity-lookalike-chatbots submitted by /u/NuseAI [link] [comments]  ( 9 min )
    Artificially Intelligent, Genuinely Creative: How AI's Triumph Over Human Creators Exposes the Illusion of Intellectual Property
    submitted by /u/DukeWilder [link] [comments]  ( 9 min )
    Is my domain name a good idea? What can I build on it? Go Go AI Go dot com .... No webpage on it now, any good ideas???
    I was cooking chicken wings one evening ago in the not too distant past and this idea popped into my head. Before the night was over I went online and bought the domain name of GoGoAIGo . com and then the .ai version also. I put the dot com version up on Sedo (sedo.com/search/details/?domain=GoGoAIGo.com) for sale and I actually now own the .com .ai .org and .net versions of that phrase. Not only my decade but the two generational decades in front of me and the one generational decade behind me can remember our ole Inspector Gadget friend whom had a similar phrase, but not exact, that he would say. I'm an individual whom may hold onto something if I feel it has intrinsic value for a future development, which I think this can if laid out in an appropriate fashion. I'm working on another business project right now and I own some trademarks for my other business project so I'm not exactly a newbie in ways here I'm just kind of fresh to the AI realm studies. I think it's overblown right now but will be fine tuned over the next 5-7 years better and society will find a better seat for it. I could see this domain being like a search engine or something, maybe even something to do with robots. I expect AI robots moving forward will be regulated and have various classes that they are placed into as we integrate certain ones in our society. Let's be honest, the light-switch isn't flipping overnight or even in one quick year over this AI stuff. I'm in no rush to have a piece of AI wash my dishes for me to be honest. The last robotic thing I was thinking about getting was a robot mower to cut a field, I believe they are working on those now. Anybody have any unique ideas for me? I used to play with lego robots way back in high school in the early 2000's.... Seems like this website would make a great search engine but honestly there are other phrases that can be put into play with society also. Thanks for any mental stimulation you can toss in my direction. submitted by /u/Wise_Cut_2543 [link] [comments]  ( 10 min )
    CGPT-4, how could an AI app designed to move people from their screens to better enjoying the people in their life do this?
    Imagine an app that's like a helpful buddy in your pocket, always looking out for the best moments to nudge you into some real-world socializing. For example, say you're a fan of watching sports. The app notices you frequently check scores or read articles on sports sites during weekends. Right before a big game, it pops up and says, "How about inviting some friends over to watch the game?" Now let's talk about making socializing a sort of game. Think of the way Fitbit rewards you for walking 10,000 steps. Similarly, this app could reward you with "social points" for various activities. Invite a friend for coffee? 10 points. Call your mom? 15 points. Organize a barbecue? 50 points. And so on. These points could unlock virtual badges or even real-world rewards like discounts at local restaurants to encourage you to keep going. When it comes to setting personal goals, let's say you've been wanting to improve your relationship with a sibling. You set a goal in the app to have at least one meaningful conversation with them each week. The app then reminds you on a lazy Sunday afternoon, suggesting, "Why not call your sister now? It’s a good time to catch up." And for reflection, after you've hung out with your friends to watch the game or had that talk with your sister, the app asks you to rate how good you felt on a scale of 1-10. Over time, you'll see a graph of your happiness levels correlated with your social activities, making it super clear that quality time with people is a mood booster. The whole idea is to keep it simple but effective, helping you to naturally weave more social interactions into your life without making you feel overwhelmed or stressed. submitted by /u/Georgeo57 [link] [comments]  ( 10 min )
    Is AI a Platform Shift?
    AI has the potential to be a platform shift, similar to previous shifts like personal computers, the internet, and mobile. Platform shifts change the dominant layer that applications are built on and can capture the majority of value from the previous generation. AI could change distribution, business models, and what's possible in workflows. Changes in distribution could lead to new aggregators replacing old ones, making the aggregation of quality content more difficult. The business model may not change significantly, with AI likely being delivered as software-as-a-service. AI can enable new workflows and drastically change existing ones. While incumbents may accrue significant value, new platforms could also replace old ones. Source : https://matt-rickard.com/is-ai-a-platform-shift submitted by /u/NuseAI [link] [comments]  ( 9 min )
    Is there a market for Small Language Models for specific jobs/domains?
    It seems that large language models are getting bigger and bigger, and by growing they need more and more processing power. I know that some LLM developers have made smaller versions to test how small they can be made and function. But what happens when you want a LLM to do a specific job, surely it only needs a fraction of the data a general-purpose model does. Potential benefits of SLMs: Less data. Potentially faster. Less space to hallucinate/go wrong. Smaller set of potentials for complete testing. Running costs reduced. Lower spec hardware needs. Has anyone tried dedicating a LLM to a specific job/task and then optimizing its data size to create a SLM? TLDR; How large does a LLM have to be for a toaster or microwave? Talkie Toaster https://www.youtube.com/watch?v=vLm6oTCFcxQ submitted by /u/Arowx [link] [comments]  ( 9 min )
    Books 3 has revealed thousands of pirated Australian books. In the age of AI, is copyright law still fit for purpose?
    submitted by /u/Jariiari7 [link] [comments]  ( 9 min )
    Deep dive into Mastering Prompt Engineering (Prompt-tier list)
    submitted by /u/Senior_tasteey [link] [comments]  ( 9 min )
    Looking for open source headless text to singing or better yet MIDI to singing software
    Scoured the Internet using all available tools. All I've come up with is proprietary and obsolete software and/or GUI-based software. My goal is to create an ElevenLabs type api but for singing. Something like Flinger (dead) would be ideal. If I can't find it I plan to write it but I'd rather not reinvent the wheel. submitted by /u/geeezeredm [link] [comments]  ( 9 min )
    Is it possible for AI to deeply analyze importance of thousands of daily news?
    I have access to texts of thousands of world news daily. Is it possible to make an AI that would analyze them and sort by importance? All I could find similar is NLP for analyzing text content and extracting keywords, or metadata, but this approach doesn't work well. I want for AI to grasp the essence of news and deeply understand their importance, to comprehend how an event affects many people's lives and has significant impact on society or the world as a whole. submitted by /u/canman44999 [link] [comments]  ( 9 min )
    Dalle-3 has me thinking about my unborn child and reality itself.
    I was able to throw these images together in seconds and it has me stunned. This is all in the first year of mainstream AI. Where are we going to be this time next year.. Philosophically what do you believe is going to happen to our paradigms of reality over the coming years? This is an especially challenging thought because we consume so much content and information digitally. I'm a little worried about how humans will or will not adjust to this incoming technology. I'm having my first child soon and it's interesting to think about what I may have to teach them. That nothing you consume digitally is real, only what you can experience with all 5 senses in your local environment is. Strange thoughts to be having for sure. With peace, Aqua. submitted by /u/Aquaritek [link] [comments]  ( 9 min )
    The Ethical Dilemmas of AI in Sci-Fi and Reality
    An interesting article about ethics and AI in the real world versus what we find in scifi. Exploring points like privacy invasion, possible sentience, control and moral decisions. https://discover.hubpages.com/technology/the-ethical-dilemmas-of-ai-in-sci-fi-and-reality submitted by /u/No_Adhesiveness_7209 [link] [comments]  ( 9 min )
    One-Minute Daily AI News 9/29/2023
    Meta Platforms (META.O) Chief Executive Mark Zuckerberg on Wednesday rolled out new AI products for consumers, including bots that create photo-realistic images and smart glasses that answer questions, as well as an updated virtual-reality headset.[1] The European Union is examining alleged anticompetitive practices in chips used for artificial intelligence, a market that Nvidia (NVDA.O) dominates, Bloomberg News reported on Friday, citing people familiar with the matter.[2] Sex robots powered by futuristic AI algorithm will one day give humans the best sex of their lives, it has been sensationally claimed.[3] National Security Agency Director Army Gen. Paul M. Nakasone today announced the creation of a new entity to oversee the development and integration of artificial intelligence capabilities within U.S. national security systems.[4] Sources: [1] https://www.reuters.com/technology/meta-signal-future-arvr-investments-annual-connect-conference-2023-09-27/ [2] https://www.reuters.com/technology/eu-starts-early-stage-probe-into-nvidia-dominated-ai-chip-market-abuses-2023-09-29/ [3] https://www.dailystar.co.uk/news/weird-news/sex-robots-using-ai-give-31059169 [4] https://www.defense.gov/News/News-Stories/Article/Article/3541838/ai-security-center-to-open-at-national-security-agency/ submitted by /u/Excellent-Target-847 [link] [comments]  ( 9 min )
  • Open

    Consecutive coupon collector problem
    Coupon collector problem Suppose you have a bag of balls labeled 1 through 1,000. You draw draw balls one at a time and put them back after each draw. How many draws would you have to make before you’ve seen every ball at least once? This is the coupon collector problem with N = 1000, […] Consecutive coupon collector problem first appeared on John D. Cook.  ( 6 min )
  • Open

    Testing RNN with RLlib
    Hi folks! Since you've saved my ass before, maybe you have an idea about my issue here, too. I'm training and testing a custom RNN, but I receive the following error message: File "/home/.conda/envs/ray/lib/python3.9/site-packages/ray/rllib/utils/threading.py", line 24, in wrapper return func(self, *a, **k) File "/home/.conda/envs/ray/lib/python3.9/site-packages/ray/rllib/policy/torch_policy_v2.py", line 1291, in _compute_action_helper dist_inputs, state_out = self.model(input_dict, state_batches, seq_lens) File "/home/.conda/envs/ray/lib/python3.9/site-packages/ray/rllib/models/modelv2.py", line 259, in __call__ res = self.forward(restored, state or [], seq_lens) File "/home/.conda/envs/ray/lib/python3.9/site-packages/ray/rllib/models/torch/recurrent_net.py", line 92, in forward i…  ( 9 min )
  • Open

    RACH-Space: Reconstructing Adaptive Convex Hull Space with applications in weak supervision. (arXiv:2307.04870v3 [cs.LG] UPDATED)
    We introduce RACH-Space, a novel classification method in ensemble learning. In particular, we show its applicability as a label model for weakly supervised learning. RACH-Space offers simplicity in implementation with minimal assumptions on the data or weak signals. The model is well suited for scenarios where fully labeled data is not available. Our method is built upon geometrical interpretation of the space spanned by weak signals. Our analysis of the high dimensional convex hull structure underlying general set of weak signals bridges geometry with machine learning. Empirical results also demonstrate that RACH-Space works well in practice and compares favorably to best existing label models for weakly supervised learning.  ( 2 min )
    From Complexity to Clarity: Analytical Expressions of Deep Neural Network Weights via Clifford's Geometric Algebra and Convexity. (arXiv:2309.16512v1 [cs.LG])
    In this paper, we introduce a novel analysis of neural networks based on geometric (Clifford) algebra and convex optimization. We show that optimal weights of deep ReLU neural networks are given by the wedge product of training samples when trained with standard regularized loss. Furthermore, the training problem reduces to convex optimization over wedge product features, which encode the geometric structure of the training dataset. This structure is given in terms of signed volumes of triangles and parallelotopes generated by data vectors. The convex problem finds a small subset of samples via $\ell_1$ regularization to discover only relevant wedge product features. Our analysis provides a novel perspective on the inner workings of deep neural networks and sheds light on the role of the hidden layers.  ( 2 min )
    MHG-GNN: Combination of Molecular Hypergraph Grammar with Graph Neural Network. (arXiv:2309.16374v1 [cs.LG])
    Property prediction plays an important role in material discovery. As an initial step to eventually develop a foundation model for material science, we introduce a new autoencoder called the MHG-GNN, which combines graph neural network (GNN) with Molecular Hypergraph Grammar (MHG). Results on a variety of property prediction tasks with diverse materials show that MHG-GNN is promising.  ( 2 min )
    Group-Agent Reinforcement Learning. (arXiv:2202.05135v4 [cs.LG] UPDATED)
    It can largely benefit the reinforcement learning (RL) process of each agent if multiple geographically distributed agents perform their separate RL tasks cooperatively. Different from multi-agent reinforcement learning (MARL) where multiple agents are in a common environment and should learn to cooperate or compete with each other, in this case each agent has its separate environment and only communicates with others to share knowledge without any cooperative or competitive behaviour as a learning outcome. In fact, this scenario exists widely in real life whose concept can be utilised in many applications, but is not well understood yet and not well formulated. As the first effort, we propose group-agent system for RL as a formulation of this scenario and the third type of RL system with respect to single-agent and multi-agent systems. We then propose a distributed RL framework called DDAL (Decentralised Distributed Asynchronous Learning) designed for group-agent reinforcement learning (GARL). We show through experiments that DDAL achieved desirable performance with very stable training and has good scalability.  ( 2 min )
    AutoCLIP: Auto-tuning Zero-Shot Classifiers for Vision-Language Models. (arXiv:2309.16414v1 [cs.CV])
    Classifiers built upon vision-language models such as CLIP have shown remarkable zero-shot performance across a broad range of image classification tasks. Prior work has studied different ways of automatically creating descriptor sets for every class based on prompt templates, ranging from manually engineered templates over templates obtained from a large language model to templates built from random words and characters. In contrast, deriving zero-shot classifiers from the respective encoded class descriptors has remained nearly unchanged, that is: classify to the class that maximizes the cosine similarity between its averaged encoded class descriptors and the encoded image. However, weighting all class descriptors equally can be suboptimal when certain descriptors match visual clues on a given image better than others. In this work, we propose AutoCLIP, a method for auto-tuning zero-shot classifiers. AutoCLIP assigns to each prompt template per-image weights, which are derived from statistics of class descriptor-image similarities at inference time. AutoCLIP is fully unsupervised, has very low overhead, and can be easily implemented in few lines of code. We show that for a broad range of vision-language models, datasets, and prompt templates, AutoCLIP outperforms baselines consistently and by up to 3 percent point accuracy.  ( 2 min )
    An Uncertainty-Aware Pseudo-Label Selection Framework using Regularized Conformal Prediction. (arXiv:2309.15963v1 [cs.LG])
    Consistency regularization-based methods are prevalent in semi-supervised learning (SSL) algorithms due to their exceptional performance. However, they mainly depend on domain-specific data augmentations, which are not usable in domains where data augmentations are less practicable. On the other hand, Pseudo-labeling (PL) is a general and domain-agnostic SSL approach that, unlike consistency regularization-based methods, does not rely on the domain. PL underperforms due to the erroneous high-confidence predictions from poorly calibrated models. This paper proposes an uncertainty-aware pseudo-label selection framework that employs uncertainty sets yielded by the conformal regularization algorithm to fix the poor calibration neural networks, reducing noisy training data. The codes of this work are available at: https://github.com/matinmoezzi/ups conformal classification  ( 2 min )
    Compilation as a Defense: Enhancing DL Model Attack Robustness via Tensor Optimization. (arXiv:2309.16577v1 [cs.LG])
    Adversarial Machine Learning (AML) is a rapidly growing field of security research, with an often overlooked area being model attacks through side-channels. Previous works show such attacks to be serious threats, though little progress has been made on efficient remediation strategies that avoid costly model re-engineering. This work demonstrates a new defense against AML side-channel attacks using model compilation techniques, namely tensor optimization. We show relative model attack effectiveness decreases of up to 43% using tensor optimization, discuss the implications, and direction of future work.  ( 2 min )
    Compositional Program Generation for Systematic Generalization. (arXiv:2309.16467v1 [cs.LG])
    Compositional generalization is a key ability of humans that enables us to learn new concepts from only a handful examples. Machine learning models, including the now ubiquitous transformers, struggle to generalize in this way, and typically require thousands of examples of a concept during training in order to generalize meaningfully. This difference in ability between humans and artificial neural architectures, motivates this study on a neuro-symbolic architecture called the Compositional Program Generator (CPG). CPG has three key features: modularity, type abstraction, and recursive composition, that enable it to generalize both systematically to new concepts in a few-shot manner, as well as productively by length on various sequence-to-sequence language tasks. For each input, CPG uses a grammar of the input domain and a parser to generate a type hierarchy in which each grammar rule is assigned its own unique semantic module, a probabilistic copy or substitution program. Instances with the same hierarchy are processed with the same composed program, while those with different hierarchies may be processed with different programs. CPG learns parameters for the semantic modules and is able to learn the semantics for new types incrementally. Given a context-free grammar of the input language and a dictionary mapping each word in the source language to its interpretation in the output language, CPG can achieve perfect generalization on the SCAN and COGS benchmarks, in both standard and extreme few-shot settings.  ( 3 min )
    Generative Semi-supervised Learning with Meta-Optimized Synthetic Samples. (arXiv:2309.16143v1 [cs.LG])
    Semi-supervised learning (SSL) is a promising approach for training deep classification models using labeled and unlabeled datasets. However, existing SSL methods rely on a large unlabeled dataset, which may not always be available in many real-world applications due to legal constraints (e.g., GDPR). In this paper, we investigate the research question: Can we train SSL models without real unlabeled datasets? Instead of using real unlabeled datasets, we propose an SSL method using synthetic datasets generated from generative foundation models trained on datasets containing millions of samples in diverse domains (e.g., ImageNet). Our main concepts are identifying synthetic samples that emulate unlabeled samples from generative foundation models and training classifiers using these synthetic samples. To achieve this, our method is formulated as an alternating optimization problem: (i) meta-learning of generative foundation models and (ii) SSL of classifiers using real labeled and synthetic unlabeled samples. For (i), we propose a meta-learning objective that optimizes latent variables to generate samples that resemble real labeled samples and minimize the validation loss. For (ii), we propose a simple unsupervised loss function that regularizes the feature extractors of classifiers to maximize the performance improvement obtained from synthetic samples. We confirm that our method outperforms baselines using generative foundation models on SSL. We also demonstrate that our methods outperform SSL using real unlabeled datasets in scenarios with extremely small amounts of labeled datasets. This suggests that synthetic samples have the potential to provide improvement gains more efficiently than real unlabeled data.  ( 3 min )
    Instance-Agnostic Geometry and Contact Dynamics Learning. (arXiv:2309.05832v2 [cs.CV] UPDATED)
    This work presents an instance-agnostic learning framework that fuses vision with dynamics to simultaneously learn shape, pose trajectories, and physical properties via the use of geometry as a shared representation. Unlike many contact learning approaches that assume motion capture input and a known shape prior for the collision model, our proposed framework learns an object's geometric and dynamic properties from RGBD video, without requiring either category-level or instance-level shape priors. We integrate a vision system, BundleSDF, with a dynamics system, ContactNets, and propose a cyclic training pipeline to use the output from the dynamics module to refine the poses and the geometry from the vision module, using perspective reprojection. Experiments demonstrate our framework's ability to learn the geometry and dynamics of rigid and convex objects and improve upon the current tracking framework.  ( 2 min )
    Safe Imitation Learning of Nonlinear Model Predictive Control for Flexible Robots. (arXiv:2212.02941v2 [cs.RO] UPDATED)
    Flexible robots may overcome some of the industry's major challenges, such as enabling intrinsically safe human-robot collaboration and achieving a higher load-to-mass ratio. However, controlling flexible robots is complicated due to their complex dynamics, which include oscillatory behavior and a high-dimensional state space. NMPC offers an effective means to control such robots, but its extensive computational demands often limit its application in real-time scenarios. To enable fast control of flexible robots, we propose a framework for a safe approximation of NMPC using imitation learning and a predictive safety filter. Our framework significantly reduces computation time while incurring a slight loss in performance. Compared to NMPC, our framework shows more than a eightfold improvement in computation time when controlling a three-dimensional flexible robot arm in simulation, all while guaranteeing safety constraints. Notably, our approach outperforms conventional reinforcement learning methods. The development of fast and safe approximate NMPC holds the potential to accelerate the adoption of flexible robots in industry.  ( 2 min )
    Deep learning models for price forecasting of financial time series: A review of recent advancements: 2020-2022. (arXiv:2305.04811v2 [q-fin.ST] UPDATED)
    Accurately predicting the prices of financial time series is essential and challenging for the financial sector. Owing to recent advancements in deep learning techniques, deep learning models are gradually replacing traditional statistical and machine learning models as the first choice for price forecasting tasks. This shift in model selection has led to a notable rise in research related to applying deep learning models to price forecasting, resulting in a rapid accumulation of new knowledge. Therefore, we conducted a literature review of relevant studies over the past three years with a view to aiding researchers and practitioners in the field. This review delves deeply into deep learning-based forecasting models, presenting information on model architectures, practical applications, and their respective advantages and disadvantages. In particular, detailed information is provided on advanced models for price forecasting, such as Transformers, generative adversarial networks (GANs), graph neural networks (GNNs), and deep quantum neural networks (DQNNs). The present contribution also includes potential directions for future research, such as examining the effectiveness of deep learning models with complex structures for price forecasting, extending from point prediction to interval prediction using deep learning models, scrutinising the reliability and validity of decomposition ensembles, and exploring the influence of data volume on model performance.  ( 3 min )
    TinyMetaFed: Efficient Federated Meta-Learning for TinyML. (arXiv:2307.06822v3 [cs.LG] UPDATED)
    The field of Tiny Machine Learning (TinyML) has made substantial advancements in democratizing machine learning on low-footprint devices, such as microcontrollers. The prevalence of these miniature devices raises the question of whether aggregating their knowledge can benefit TinyML applications. Federated meta-learning is a promising answer to this question, as it addresses the scarcity of labeled data and heterogeneous data distribution across devices in the real world. However, deploying TinyML hardware faces unique resource constraints, making existing methods impractical due to energy, privacy, and communication limitations. We introduce TinyMetaFed, a model-agnostic meta-learning framework suitable for TinyML. TinyMetaFed facilitates collaborative training of a neural network initialization that can be quickly fine-tuned on new devices. It offers communication savings and privacy protection through partial local reconstruction and Top-P% selective communication, computational efficiency via online learning, and robustness to client heterogeneity through few-shot learning. The evaluations on three TinyML use cases demonstrate that TinyMetaFed can significantly reduce energy consumption and communication overhead, accelerate convergence, and stabilize the training process.  ( 2 min )
    Exploiting Edge Features in Graphs with Fused Network Gromov-Wasserstein Distance. (arXiv:2309.16604v1 [stat.ML])
    Pairwise comparison of graphs is key to many applications in Machine learning ranging from clustering, kernel-based classification/regression and more recently supervised graph prediction. Distances between graphs usually rely on informative representations of these structured objects such as bag of substructures or other graph embeddings. A recently popular solution consists in representing graphs as metric measure spaces, allowing to successfully leverage Optimal Transport, which provides meaningful distances allowing to compare them: the Gromov-Wasserstein distances. However, this family of distances overlooks edge attributes, which are essential for many structured objects. In this work, we introduce an extension of Gromov-Wasserstein distance for comparing graphs whose both nodes and edges have features. We propose novel algorithms for distance and barycenter computation. We empirically show the effectiveness of the novel distance in learning tasks where graphs occur in either input space or output space, such as classification and graph prediction.  ( 2 min )
    Capturing the Diffusive Behavior of the Multiscale Linear Transport Equations by Asymptotic-Preserving Convolutional DeepONets. (arXiv:2306.15891v3 [cs.LG] UPDATED)
    In this paper, we introduce two types of novel Asymptotic-Preserving Convolutional Deep Operator Networks (APCONs) designed to address the multiscale time-dependent linear transport problem. We observe that the vanilla physics-informed DeepONets with modified MLP may exhibit instability in maintaining the desired limiting macroscopic behavior. Therefore, this necessitates the utilization of an asymptotic-preserving loss function. Drawing inspiration from the heat kernel in the diffusion equation, we propose a new architecture called Convolutional Deep Operator Networks, which employ multiple local convolution operations instead of a global heat kernel, along with pooling and activation operations in each filter layer. Our APCON methods possess a parameter count that is independent of the grid size and are capable of capturing the diffusive behavior of the linear transport problem. Finally, we validate the effectiveness of our methods through several numerical examples.  ( 2 min )
    Visual In-Context Learning for Few-Shot Eczema Segmentation. (arXiv:2309.16656v1 [cs.CV])
    Automated diagnosis of eczema from digital camera images is crucial for developing applications that allow patients to self-monitor their recovery. An important component of this is the segmentation of eczema region from such images. Current methods for eczema segmentation rely on deep neural networks such as convolutional (CNN)-based U-Net or transformer-based Swin U-Net. While effective, these methods require high volume of annotated data, which can be difficult to obtain. Here, we investigate the capabilities of visual in-context learning that can perform few-shot eczema segmentation with just a handful of examples and without any need for retraining models. Specifically, we propose a strategy for applying in-context learning for eczema segmentation with a generalist vision model called SegGPT. When benchmarked on a dataset of annotated eczema images, we show that SegGPT with just 2 representative example images from the training dataset performs better (mIoU: 36.69) than a CNN U-Net trained on 428 images (mIoU: 32.60). We also discover that using more number of examples for SegGPT may in fact be harmful to its performance. Our result highlights the importance of visual in-context learning in developing faster and better solutions to skin imaging tasks. Our result also paves the way for developing inclusive solutions that can cater to minorities in the demographics who are typically heavily under-represented in the training data.  ( 2 min )
    Harmonic-NAS: Hardware-Aware Multimodal Neural Architecture Search on Resource-constrained Devices. (arXiv:2309.06612v2 [cs.LG] UPDATED)
    The recent surge of interest surrounding Multimodal Neural Networks (MM-NN) is attributed to their ability to effectively process and integrate multiscale information from diverse data sources. MM-NNs extract and fuse features from multiple modalities using adequate unimodal backbones and specific fusion networks. Although this helps strengthen the multimodal information representation, designing such networks is labor-intensive. It requires tuning the architectural parameters of the unimodal backbones, choosing the fusing point, and selecting the operations for fusion. Furthermore, multimodality AI is emerging as a cutting-edge option in Internet of Things (IoT) systems where inference latency and energy consumption are critical metrics in addition to accuracy. In this paper, we propose Harmonic-NAS, a framework for the joint optimization of unimodal backbones and multimodal fusion networks with hardware awareness on resource-constrained devices. Harmonic-NAS involves a two-tier optimization approach for the unimodal backbone architectures and fusion strategy and operators. By incorporating the hardware dimension into the optimization, evaluation results on various devices and multimodal datasets have demonstrated the superiority of Harmonic-NAS over state-of-the-art approaches achieving up to 10.9% accuracy improvement, 1.91x latency reduction, and 2.14x energy efficiency gain.  ( 2 min )
    Learning Large-Scale MTP$_2$ Gaussian Graphical Models via Bridge-Block Decomposition. (arXiv:2309.13405v2 [cs.LG] UPDATED)
    This paper studies the problem of learning the large-scale Gaussian graphical models that are multivariate totally positive of order two ($\text{MTP}_2$). By introducing the concept of bridge, which commonly exists in large-scale sparse graphs, we show that the entire problem can be equivalently optimized through (1) several smaller-scaled sub-problems induced by a \emph{bridge-block decomposition} on the thresholded sample covariance graph and (2) a set of explicit solutions on entries corresponding to \emph{bridges}. From practical aspect, this simple and provable discipline can be applied to break down a large problem into small tractable ones, leading to enormous reduction on the computational complexity and substantial improvements for all existing algorithms. The synthetic and real-world experiments demonstrate that our proposed method presents a significant speed-up compared to the state-of-the-art benchmarks.  ( 2 min )
    Patch-level Neighborhood Interpolation: A General and Effective Graph-based Regularization Strategy. (arXiv:1911.09307v2 [cs.LG] UPDATED)
    Regularization plays a crucial role in machine learning models, especially for deep neural networks. The existing regularization techniques mainly rely on the i.i.d. assumption and only consider the knowledge from the current sample, without the leverage of the neighboring relationship between samples. In this work, we propose a general regularizer called \textbf{Patch-level Neighborhood Interpolation~(Pani)} that conducts a non-local representation in the computation of networks. Our proposal explicitly constructs patch-level graphs in different layers and then linearly interpolates neighborhood patch features, serving as a general and effective regularization strategy. Further, we customize our approach into two kinds of popular regularization methods, namely Virtual Adversarial Training (VAT) and MixUp as well as its variants. The first derived \textbf{Pani VAT} presents a novel way to construct non-local adversarial smoothness by employing patch-level interpolated perturbations. The second derived \textbf{Pani MixUp} method extends the MixUp, and achieves superiority over MixUp and competitive performance over state-of-the-art variants of MixUp method with a significant advantage in computational efficiency. Extensive experiments have verified the effectiveness of our Pani approach in both supervised and semi-supervised settings.  ( 2 min )
    Delay-Aware Hierarchical Federated Learning. (arXiv:2303.12414v4 [cs.LG] UPDATED)
    Federated learning has gained popularity as a means of training models distributed across the wireless edge. The paper introduces delay-aware hierarchical federated learning (DFL) to improve the efficiency of distributed machine learning (ML) model training by accounting for communication delays between edge and cloud. Different from traditional federated learning, DFL leverages multiple stochastic gradient descent iterations on local datasets within each global aggregation period and intermittently aggregates model parameters through edge servers in local subnetworks. During global synchronization, the cloud server consolidates local models with the outdated global model using a local-global combiner, thus preserving crucial elements of both, enhancing learning efficiency under the presence of delay. A set of conditions is obtained to achieve the sub-linear convergence rate of O(1/k) for strongly convex and smooth loss functions. Based on these findings, an adaptive control algorithm is developed for DFL, implementing policies to mitigate energy consumption and communication latency while aiming for sublinear convergence. Numerical evaluations show DFL's superior performance in terms of faster global model convergence, reduced resource consumption, and robustness against communication delays compared to existing FL algorithms. In summary, this proposed method offers improved efficiency and results when dealing with both convex and non-convex loss functions.  ( 2 min )
    On the Trade-offs between Adversarial Robustness and Actionable Explanations. (arXiv:2309.16452v1 [cs.LG])
    As machine learning models are increasingly being employed in various high-stakes settings, it becomes important to ensure that predictions of these models are not only adversarially robust, but also readily explainable to relevant stakeholders. However, it is unclear if these two notions can be simultaneously achieved or if there exist trade-offs between them. In this work, we make one of the first attempts at studying the impact of adversarially robust models on actionable explanations which provide end users with a means for recourse. We theoretically and empirically analyze the cost (ease of implementation) and validity (probability of obtaining a positive model prediction) of recourses output by state-of-the-art algorithms when the underlying models are adversarially robust vs. non-robust. More specifically, we derive theoretical bounds on the differences between the cost and the validity of the recourses generated by state-of-the-art algorithms for adversarially robust vs. non-robust linear and non-linear models. Our empirical results with multiple real-world datasets validate our theoretical results and show the impact of varying degrees of model robustness on the cost and validity of the resulting recourses. Our analyses demonstrate that adversarially robust models significantly increase the cost and reduce the validity of the resulting recourses, thus shedding light on the inherent trade-offs between adversarial robustness and actionable explanations  ( 2 min )
    Cross-Prediction-Powered Inference. (arXiv:2309.16598v1 [stat.ML])
    While reliable data-driven decision-making hinges on high-quality labeled data, the acquisition of quality labels often involves laborious human annotations or slow and expensive scientific measurements. Machine learning is becoming an appealing alternative as sophisticated predictive techniques are being used to quickly and cheaply produce large amounts of predicted labels; e.g., predicted protein structures are used to supplement experimentally derived structures, predictions of socioeconomic indicators from satellite imagery are used to supplement accurate survey data, and so on. Since predictions are imperfect and potentially biased, this practice brings into question the validity of downstream inferences. We introduce cross-prediction: a method for valid inference powered by machine learning. With a small labeled dataset and a large unlabeled dataset, cross-prediction imputes the missing labels via machine learning and applies a form of debiasing to remedy the prediction inaccuracies. The resulting inferences achieve the desired error probability and are more powerful than those that only leverage the labeled data. Closely related is the recent proposal of prediction-powered inference, which assumes that a good pre-trained model is already available. We show that cross-prediction is consistently more powerful than an adaptation of prediction-powered inference in which a fraction of the labeled data is split off and used to train the model. Finally, we observe that cross-prediction gives more stable conclusions than its competitors; its confidence intervals typically have significantly lower variability.  ( 2 min )
    Generalizable Heterogeneous Federated Cross-Correlation and Instance Similarity Learning. (arXiv:2309.16286v1 [cs.LG])
    Federated learning is an important privacy-preserving multi-party learning paradigm, involving collaborative learning with others and local updating on private data. Model heterogeneity and catastrophic forgetting are two crucial challenges, which greatly limit the applicability and generalizability. This paper presents a novel FCCL+, federated correlation and similarity learning with non-target distillation, facilitating the both intra-domain discriminability and inter-domain generalization. For heterogeneity issue, we leverage irrelevant unlabeled public data for communication between the heterogeneous participants. We construct cross-correlation matrix and align instance similarity distribution on both logits and feature levels, which effectively overcomes the communication barrier and improves the generalizable ability. For catastrophic forgetting in local updating stage, FCCL+ introduces Federated Non Target Distillation, which retains inter-domain knowledge while avoiding the optimization conflict issue, fulling distilling privileged inter-domain information through depicting posterior classes relation. Considering that there is no standard benchmark for evaluating existing heterogeneous federated learning under the same setting, we present a comprehensive benchmark with extensive representative methods under four domain shift scenarios, supporting both heterogeneous and homogeneous federated settings. Empirical results demonstrate the superiority of our method and the efficiency of modules on various scenarios.  ( 2 min )
    Language models in molecular discovery. (arXiv:2309.16235v1 [physics.chem-ph])
    The success of language models, especially transformer-based architectures, has trickled into other domains giving rise to "scientific language models" that operate on small molecules, proteins or polymers. In chemistry, language models contribute to accelerating the molecule discovery cycle as evidenced by promising recent findings in early-stage drug discovery. Here, we review the role of language models in molecular discovery, underlining their strength in de novo drug design, property prediction and reaction chemistry. We highlight valuable open-source software assets thus lowering the entry barrier to the field of scientific language modeling. Last, we sketch a vision for future molecular design that combines a chatbot interface with access to computational chemistry tools. Our contribution serves as a valuable resource for researchers, chemists, and AI enthusiasts interested in understanding how language models can and will be used to accelerate chemical discovery.  ( 2 min )
    Synthesizing Stable Reduced-Order Visuomotor Policies for Nonlinear Systems via Sums-of-Squares Optimization. (arXiv:2304.12405v2 [cs.RO] UPDATED)
    We present a method for synthesizing dynamic, reduced-order output-feedback polynomial control policies for control-affine nonlinear systems which guarantees runtime stability to a goal state, when using visual observations and a learned perception module in the feedback control loop. We leverage Lyapunov analysis to formulate the problem of synthesizing such policies. This problem is nonconvex in the policy parameters and the Lyapunov function that is used to prove the stability of the policy. To solve this problem approximately, we propose two approaches: the first solves a sequence of sum-of-squares optimization problems to iteratively improve a policy which is provably-stable by construction, while the second directly performs gradient-based optimization on the parameters of the polynomial policy, and its closed-loop stability is verified a posteriori. We extend our approach to provide stability guarantees in the presence of observation noise, which realistically arises due to errors in the learned perception module. We evaluate our approach on several underactuated nonlinear systems, including pendula and quadrotors, showing that our guarantees translate to empirical stability when controlling these systems from images, while baseline approaches can fail to reliably stabilize the system.
    Probabilistic Invariant Learning with Randomized Linear Classifiers. (arXiv:2308.04412v2 [cs.LG] UPDATED)
    Designing models that are both expressive and preserve known invariances of tasks is an increasingly hard problem. Existing solutions tradeoff invariance for computational or memory resources. In this work, we show how to leverage randomness and design models that are both expressive and invariant but use less resources. Inspired by randomized algorithms, our key insight is that accepting probabilistic notions of universal approximation and invariance can reduce our resource requirements. More specifically, we propose a class of binary classification models called Randomized Linear Classifiers (RLCs). We give parameter and sample size conditions in which RLCs can, with high probability, approximate any (smooth) function while preserving invariance to compact group transformations. Leveraging this result, we design three RLCs that are provably probabilistic invariant for classification tasks over sets, graphs, and spherical data. We show how these models can achieve probabilistic invariance and universality using less resources than (deterministic) neural networks and their invariant counterparts. Finally, we empirically demonstrate the benefits of this new class of models on invariant tasks where deterministic invariant neural networks are known to struggle.
    Set Learning for Accurate and Calibrated Models. (arXiv:2307.02245v3 [cs.LG] UPDATED)
    Model overconfidence and poor calibration are common in machine learning and difficult to account for when applying standard empirical risk minimization. In this work, we propose a novel method to alleviate these problems that we call odd-$k$-out learning (OKO), which minimizes the cross-entropy error for sets rather than for single examples. This naturally allows the model to capture correlations across data examples and achieves both better accuracy and calibration, especially in limited training data and class-imbalanced regimes. Perhaps surprisingly, OKO often yields better calibration even when training with hard labels and dropping any additional calibration parameter tuning, such as temperature scaling. We provide theoretical justification, establishing that OKO naturally yields better calibration, and provide extensive experimental analyses that corroborate our theoretical findings. We emphasize that OKO is a general framework that can be easily adapted to many settings and the trained model can be applied to single examples at inference time, without introducing significant run-time overhead or architecture changes.
    Bringing the Discussion of Minima Sharpness to the Audio Domain: a Filter-Normalised Evaluation for Acoustic Scene Classification. (arXiv:2309.16369v1 [cs.SD])
    The correlation between the sharpness of loss minima and generalisation in the context of deep neural networks has been subject to discussion for a long time. Whilst mostly investigated in the context of selected benchmark data sets in the area of computer vision, we explore this aspect for the audio scene classification task of the DCASE2020 challenge data. Our analysis is based on twodimensional filter-normalised visualisations and a derived sharpness measure. Our exploratory analysis shows that sharper minima tend to show better generalisation than flat minima -even more so for out-of-domain data, recorded from previously unseen devices-, thus adding to the dispute about better generalisation capabilities of flat minima. We further find that, in particular, the choice of optimisers is a main driver of the sharpness of minima and we discuss resulting limitations with respect to comparability. Our code, trained model states and loss landscape visualisations are publicly available.
    Attribute Graph Clustering via Learnable Augmentation. (arXiv:2212.03559v2 [cs.LG] UPDATED)
    Contrastive deep graph clustering (CDGC) utilizes contrastive learning to group nodes into different clusters. Better augmentation techniques benefit the quality of the contrastive samples, thus being one of key factors to improve performance. However, the augmentation samples in existing methods are always predefined by human experiences, and agnostic from the downstream task clustering, thus leading to high human resource costs and poor performance. To this end, we propose an Attribute Graph Clustering method via Learnable Augmentation (\textbf{AGCLA}), which introduces learnable augmentors for high-quality and suitable augmented samples for CDGC. Specifically, we design two learnable augmentors for attribute and structure information, respectively. Besides, two refinement matrices, including the high-confidence pseudo-label matrix and the cross-view sample similarity matrix, are generated to improve the reliability of the learned affinity matrix. During the training procedure, we notice that there exist differences between the optimization goals for training learnable augmentors and contrastive learning networks. In other words, we should both guarantee the consistency of the embeddings as well as the diversity of the augmented samples. Thus, an adversarial learning mechanism is designed in our method. Moreover, a two-stage training strategy is leveraged for the high-confidence refinement matrices. Extensive experimental results demonstrate the effectiveness of AGCLA on six benchmark datasets.
    DynaBench: A benchmark dataset for learning dynamical systems from low-resolution data. (arXiv:2306.05805v2 [cs.LG] UPDATED)
    Previous work on learning physical systems from data has focused on high-resolution grid-structured measurements. However, real-world knowledge of such systems (e.g. weather data) relies on sparsely scattered measuring stations. In this paper, we introduce a novel simulated benchmark dataset, DynaBench, for learning dynamical systems directly from sparsely scattered data without prior knowledge of the equations. The dataset focuses on predicting the evolution of a dynamical system from low-resolution, unstructured measurements. We simulate six different partial differential equations covering a variety of physical systems commonly used in the literature and evaluate several machine learning models, including traditional graph neural networks and point cloud processing models, with the task of predicting the evolution of the system. The proposed benchmark dataset is expected to advance the state of art as an out-of-the-box easy-to-use tool for evaluating models in a setting where only unstructured low-resolution observations are available. The benchmark is available at https://anonymous.4open.science/r/code-2022-dynabench/.
    Vertical Federated Learning: Concepts, Advances and Challenges. (arXiv:2211.12814v4 [cs.LG] UPDATED)
    Vertical Federated Learning (VFL) is a federated learning setting where multiple parties with different features about the same set of users jointly train machine learning models without exposing their raw data or model parameters. Motivated by the rapid growth in VFL research and real-world applications, we provide a comprehensive review of the concept and algorithms of VFL, as well as current advances and challenges in various aspects, including effectiveness, efficiency, and privacy. We provide an exhaustive categorization for VFL settings and privacy-preserving protocols and comprehensively analyze the privacy attacks and defense strategies for each protocol. In the end, we propose a unified framework, termed VFLow, which considers the VFL problem under communication, computation, privacy, as well as effectiveness and fairness constraints. Finally, we review the most recent advances in industrial applications, highlighting open challenges and future directions for VFL.
    The Devil is in the Details: A Deep Dive into the Rabbit Hole of Data Filtering. (arXiv:2309.15954v1 [cs.CV])
    The quality of pre-training data plays a critical role in the performance of foundation models. Popular foundation models often design their own recipe for data filtering, which makes it hard to analyze and compare different data filtering approaches. DataComp is a new benchmark dedicated to evaluating different methods for data filtering. This paper describes our learning and solution when participating in the DataComp challenge. Our filtering strategy includes three stages: single-modality filtering, cross-modality filtering, and data distribution alignment. We integrate existing methods and propose new solutions, such as computing CLIP score on horizontally flipped images to mitigate the interference of scene text, using vision and language models to retrieve training samples for target downstream tasks, rebalancing the data distribution to improve the efficiency of allocating the computational budget, etc. We slice and dice our design choices, provide in-depth analysis, and discuss open questions. Our approach outperforms the best method from the DataComp paper by over 4% on the average performance of 38 tasks and by over 2% on ImageNet.
    Machine Learning Based Analytics for the Significance of Gait Analysis in Monitoring and Managing Lower Extremity Injuries. (arXiv:2309.15990v1 [cs.LG])
    This study explored the potential of gait analysis as a tool for assessing post-injury complications, e.g., infection, malunion, or hardware irritation, in patients with lower extremity fractures. The research focused on the proficiency of supervised machine learning models predicting complications using consecutive gait datasets. We identified patients with lower extremity fractures at an academic center. Patients underwent gait analysis with a chest-mounted IMU device. Using software, raw gait data was preprocessed, emphasizing 12 essential gait variables. Machine learning models including XGBoost, Logistic Regression, SVM, LightGBM, and Random Forest were trained, tested, and evaluated. Attention was given to class imbalance, addressed using SMOTE. We introduced a methodology to compute the Rate of Change (ROC) for gait variables, independent of the time difference between gait analyses. XGBoost was the optimal model both before and after applying SMOTE. Prior to SMOTE, the model achieved an average test AUC of 0.90 (95% CI: [0.79, 1.00]) and test accuracy of 86% (95% CI: [75%, 97%]). Feature importance analysis attributed importance to the duration between injury and gait analysis. Data patterns showed early physiological compensations, followed by stabilization phases, emphasizing prompt gait analysis. This study underscores the potential of machine learning, particularly XGBoost, in gait analysis for orthopedic care. Predicting post-injury complications, early gait assessment becomes vital, revealing intervention points. The findings support a shift in orthopedics towards a data-informed approach, enhancing patient outcomes.
    Neuro-Inspired Hierarchical Multimodal Learning. (arXiv:2309.15877v1 [cs.LG])
    Integrating and processing information from various sources or modalities are critical for obtaining a comprehensive and accurate perception of the real world. Drawing inspiration from neuroscience, we develop the Information-Theoretic Hierarchical Perception (ITHP) model, which utilizes the concept of information bottleneck. Distinct from most traditional fusion models that aim to incorporate all modalities as input, our model designates the prime modality as input, while the remaining modalities act as detectors in the information pathway. Our proposed perception model focuses on constructing an effective and compact information flow by achieving a balance between the minimization of mutual information between the latent state and the input modal state, and the maximization of mutual information between the latent states and the remaining modal states. This approach leads to compact latent state representations that retain relevant information while minimizing redundancy, thereby substantially enhancing the performance of downstream tasks. Experimental evaluations on both the MUStARD and CMU-MOSI datasets demonstrate that our model consistently distills crucial information in multimodal learning scenarios, outperforming state-of-the-art benchmarks.
    Distill to Delete: Unlearning in Graph Networks with Knowledge Distillation. (arXiv:2309.16173v1 [cs.LG])
    Graph unlearning has emerged as a pivotal method to delete information from a pre-trained graph neural network (GNN). One may delete nodes, a class of nodes, edges, or a class of edges. An unlearning method enables the GNN model to comply with data protection regulations (i.e., the right to be forgotten), adapt to evolving data distributions, and reduce the GPU-hours carbon footprint by avoiding repetitive retraining. Existing partitioning and aggregation-based methods have limitations due to their poor handling of local graph dependencies and additional overhead costs. More recently, GNNDelete offered a model-agnostic approach that alleviates some of these issues. Our work takes a novel approach to address these challenges in graph unlearning through knowledge distillation, as it distills to delete in GNN (D2DGN). It is a model-agnostic distillation framework where the complete graph knowledge is divided and marked for retention and deletion. It performs distillation with response-based soft targets and feature-based node embedding while minimizing KL divergence. The unlearned model effectively removes the influence of deleted graph elements while preserving knowledge about the retained graph elements. D2DGN surpasses the performance of existing methods when evaluated on various real-world graph datasets by up to $43.1\%$ (AUC) in edge and node unlearning tasks. Other notable advantages include better efficiency, better performance in removing target elements, preservation of performance for the retained elements, and zero overhead costs. Notably, our D2DGN surpasses the state-of-the-art GNNDelete in AUC by $2.4\%$, improves membership inference ratio by $+1.3$, requires $10.2\times10^6$ fewer FLOPs per forward pass and up to $\mathbf{3.2}\times$ faster.
    Can LLMs Effectively Leverage Structural Information for Graph Learning: When and Why. (arXiv:2309.16595v1 [cs.LG])
    This paper studies Large Language Models (LLMs) for structured data--particularly graphs--a crucial data modality that remains underexplored in the LLM literature. We aim to understand when and why the incorporation of structural information inherent in graph data can improve the prediction performance of LLMs on node classification tasks. To address the ``when'' question, we examine a variety of prompting methods for encoding structural information, in settings where textual node features are either rich or scarce. For the ``why'' questions, we probe into two potential contributing factors to the LLM performance: data leakage and homophily. Our exploration of these questions reveals that (i) LLMs can benefit from structural information, especially when textual node features are scarce; (ii) there is no substantial evidence indicating that the performance of LLMs is significantly attributed to data leakage; and (iii) the performance of LLMs on a target node is strongly positively related to the local homophily ratio of the node.
    MotionLM: Multi-Agent Motion Forecasting as Language Modeling. (arXiv:2309.16534v1 [cs.CV])
    Reliable forecasting of the future behavior of road agents is a critical component to safe planning in autonomous vehicles. Here, we represent continuous trajectories as sequences of discrete motion tokens and cast multi-agent motion prediction as a language modeling task over this domain. Our model, MotionLM, provides several advantages: First, it does not require anchors or explicit latent variable optimization to learn multimodal distributions. Instead, we leverage a single standard language modeling objective, maximizing the average log probability over sequence tokens. Second, our approach bypasses post-hoc interaction heuristics where individual agent trajectory generation is conducted prior to interactive scoring. Instead, MotionLM produces joint distributions over interactive agent futures in a single autoregressive decoding process. In addition, the model's sequential factorization enables temporally causal conditional rollouts. The proposed approach establishes new state-of-the-art performance for multi-agent motion prediction on the Waymo Open Motion Dataset, ranking 1st on the interactive challenge leaderboard.
    Tensor Factorization for Leveraging Cross-Modal Knowledge in Data-Constrained Infrared Object Detection. (arXiv:2309.16592v1 [cs.CV])
    The primary bottleneck towards obtaining good recognition performance in IR images is the lack of sufficient labeled training data, owing to the cost of acquiring such data. Realizing that object detection methods for the RGB modality are quite robust (at least for some commonplace classes, like person, car, etc.), thanks to the giant training sets that exist, in this work we seek to leverage cues from the RGB modality to scale object detectors to the IR modality, while preserving model performance in the RGB modality. At the core of our method, is a novel tensor decomposition method called TensorFact which splits the convolution kernels of a layer of a Convolutional Neural Network (CNN) into low-rank factor matrices, with fewer parameters than the original CNN. We first pretrain these factor matrices on the RGB modality, for which plenty of training data are assumed to exist and then augment only a few trainable parameters for training on the IR modality to avoid over-fitting, while encouraging them to capture complementary cues from those trained only on the RGB modality. We validate our approach empirically by first assessing how well our TensorFact decomposed network performs at the task of detecting objects in RGB images vis-a-vis the original network and then look at how well it adapts to IR images of the FLIR ADAS v1 dataset. For the latter, we train models under scenarios that pose challenges stemming from data paucity. From the experiments, we observe that: (i) TensorFact shows performance gains on RGB images; (ii) further, this pre-trained model, when fine-tuned, outperforms a standard state-of-the-art object detector on the FLIR ADAS v1 dataset by about 4% in terms of mAP 50 score.
    Towards Poisoning Fair Representations. (arXiv:2309.16487v1 [cs.LG])
    Fair machine learning seeks to mitigate model prediction bias against certain demographic subgroups such as elder and female. Recently, fair representation learning (FRL) trained by deep neural networks has demonstrated superior performance, whereby representations containing no demographic information are inferred from the data and then used as the input to classification or other downstream tasks. Despite the development of FRL methods, their vulnerability under data poisoning attack, a popular protocol to benchmark model robustness under adversarial scenarios, is under-explored. Data poisoning attacks have been developed for classical fair machine learning methods which incorporate fairness constraints into shallow-model classifiers. Nonetheless, these attacks fall short in FRL due to notably different fairness goals and model architectures. This work proposes the first data poisoning framework attacking FRL. We induce the model to output unfair representations that contain as much demographic information as possible by injecting carefully crafted poisoning samples into the training data. This attack entails a prohibitive bilevel optimization, wherefore an effective approximated solution is proposed. A theoretical analysis on the needed number of poisoning samples is derived and sheds light on defending against the attack. Experiments on benchmark fairness datasets and state-of-the-art fair representation learning models demonstrate the superiority of our attack.
    Differentially Private Secure Multiplication: Hiding Information in the Rubble of Noise. (arXiv:2309.16105v1 [cs.IT])
    We consider the problem of private distributed multi-party multiplication. It is well-established that Shamir secret-sharing coding strategies can enable perfect information-theoretic privacy in distributed computation via the celebrated algorithm of Ben Or, Goldwasser and Wigderson (the "BGW algorithm"). However, perfect privacy and accuracy require an honest majority, that is, $N \geq 2t+1$ compute nodes are required to ensure privacy against any $t$ colluding adversarial nodes. By allowing for some controlled amount of information leakage and approximate multiplication instead of exact multiplication, we study coding schemes for the setting where the number of honest nodes can be a minority, that is $N< 2t+1.$ We develop a tight characterization privacy-accuracy trade-off for cases where $N < 2t+1$ by measuring information leakage using {differential} privacy instead of perfect privacy, and using the mean squared error metric for accuracy. A novel technical aspect is an intricately layered noise distribution that merges ideas from differential privacy and Shamir secret-sharing at different layers.  ( 2 min )
    Constructing Synthetic Treatment Groups without the Mean Exchangeability Assumption. (arXiv:2309.16409v1 [stat.ML])
    The purpose of this work is to transport the information from multiple randomized controlled trials to the target population where we only have the control group data. Previous works rely critically on the mean exchangeability assumption. However, as pointed out by many current studies, the mean exchangeability assumption might be violated. Motivated by the synthetic control method, we construct a synthetic treatment group for the target population by a weighted mixture of treatment groups of source populations. We estimate the weights by minimizing the conditional maximum mean discrepancy between the weighted control groups of source populations and the target population. We establish the asymptotic normality of the synthetic treatment group estimator based on the sieve semiparametric theory. Our method can serve as a novel complementary approach when the mean exchangeability assumption is violated. Experiments are conducted on synthetic and real-world datasets to demonstrate the effectiveness of our methods.  ( 2 min )
    LawBench: Benchmarking Legal Knowledge of Large Language Models. (arXiv:2309.16289v1 [cs.CL])
    Large language models (LLMs) have demonstrated strong capabilities in various aspects. However, when applying them to the highly specialized, safe-critical legal domain, it is unclear how much legal knowledge they possess and whether they can reliably perform legal-related tasks. To address this gap, we propose a comprehensive evaluation benchmark LawBench. LawBench has been meticulously crafted to have precise assessment of the LLMs' legal capabilities from three cognitive levels: (1) Legal knowledge memorization: whether LLMs can memorize needed legal concepts, articles and facts; (2) Legal knowledge understanding: whether LLMs can comprehend entities, events and relationships within legal text; (3) Legal knowledge applying: whether LLMs can properly utilize their legal knowledge and make necessary reasoning steps to solve realistic legal tasks. LawBench contains 20 diverse tasks covering 5 task types: single-label classification (SLC), multi-label classification (MLC), regression, extraction and generation. We perform extensive evaluations of 51 LLMs on LawBench, including 20 multilingual LLMs, 22 Chinese-oriented LLMs and 9 legal specific LLMs. The results show that GPT-4 remains the best-performing LLM in the legal domain, surpassing the others by a significant margin. While fine-tuning LLMs on legal specific text brings certain improvements, we are still a long way from obtaining usable and reliable LLMs in legal tasks. All data, model predictions and evaluation code are released in https://github.com/open-compass/LawBench/. We hope this benchmark provides in-depth understanding of the LLMs' domain-specified capabilities and speed up the development of LLMs in the legal domain.  ( 3 min )
    Stackelberg Batch Policy Learning. (arXiv:2309.16188v1 [stat.ML])
    Batch reinforcement learning (RL) defines the task of learning from a fixed batch of data lacking exhaustive exploration. Worst-case optimality algorithms, which calibrate a value-function model class from logged experience and perform some type of pessimistic evaluation under the learned model, have emerged as a promising paradigm for batch RL. However, contemporary works on this stream have commonly overlooked the hierarchical decision-making structure hidden in the optimization landscape. In this paper, we adopt a game-theoretical viewpoint and model the policy learning diagram as a two-player general-sum game with a leader-follower structure. We propose a novel stochastic gradient-based learning algorithm: StackelbergLearner, in which the leader player updates according to the total derivative of its objective instead of the usual individual gradient, and the follower player makes individual updates and ensures transition-consistent pessimistic reasoning. The derived learning dynamic naturally lends StackelbergLearner to a game-theoretic interpretation and provides a convergence guarantee to differentiable Stackelberg equilibria. From a theoretical standpoint, we provide instance-dependent regret bounds with general function approximation, which shows that our algorithm can learn a best-effort policy that is able to compete against any comparator policy that is covered by batch data. Notably, our theoretical regret guarantees only require realizability without any data coverage and strong function approximation conditions, e.g., Bellman closedness, which is in contrast to prior works lacking such guarantees. Through comprehensive experiments, we find that our algorithm consistently performs as well or better as compared to state-of-the-art methods in batch RL benchmark and real-world datasets.  ( 2 min )
    Improving Adaptive Online Learning Using Refined Discretization. (arXiv:2309.16044v1 [cs.LG])
    We study unconstrained Online Linear Optimization with Lipschitz losses. The goal is to simultaneously achieve ($i$) second order gradient adaptivity; and ($ii$) comparator norm adaptivity also known as "parameter freeness" in the literature. Existing regret bounds (Cutkosky and Orabona, 2018; Mhammedi and Koolen, 2020; Jacobsen and Cutkosky, 2022) have the suboptimal $O(\sqrt{V_T\log V_T})$ dependence on the gradient variance $V_T$, while the present work improves it to the optimal rate $O(\sqrt{V_T})$ using a novel continuous-time-inspired algorithm, without any impractical doubling trick. This result can be extended to the setting with unknown Lipschitz constant, eliminating the range ratio problem from prior works (Mhammedi and Koolen, 2020). Concretely, we first show that the aimed simultaneous adaptivity can be achieved fairly easily in a continuous time analogue of the problem, where the environment is modeled by an arbitrary continuous semimartingale. Then, our key innovation is a new discretization argument that preserves such adaptivity in the discrete time adversarial setting. This refines a non-gradient-adaptive discretization argument from (Harvey et al., 2023), both algorithmically and analytically, which could be of independent interest.  ( 2 min )
    Masked autoencoders are scalable learners of cellular morphology. (arXiv:2309.16064v1 [cs.CV])
    Inferring biological relationships from cellular phenotypes in high-content microscopy screens provides significant opportunity and challenge in biological research. Prior results have shown that deep vision models can capture biological signal better than hand-crafted features. This work explores how weakly supervised and self-supervised deep learning approaches scale when training larger models on larger datasets. Our results show that both CNN- and ViT-based masked autoencoders significantly outperform weakly supervised models. At the high-end of our scale, a ViT-L/8 trained on over 3.5-billion unique crops sampled from 95-million microscopy images achieves relative improvements as high as 28% over our best weakly supervised models at inferring known biological relationships curated from public databases.  ( 2 min )
    Deep Learning Based Uplink Multi-User SIMO Beamforming Design. (arXiv:2309.16603v1 [cs.IT])
    The advancement of fifth generation (5G) wireless communication networks has created a greater demand for wireless resource management solutions that offer high data rates, extensive coverage, minimal latency and energy-efficient performance. Nonetheless, traditional approaches have shortcomings when it comes to computational complexity and their ability to adapt to dynamic conditions, creating a gap between theoretical analysis and the practical execution of algorithmic solutions for managing wireless resources. Deep learning-based techniques offer promising solutions for bridging this gap with their substantial representation capabilities. We propose a novel unsupervised deep learning framework, which is called NNBF, for the design of uplink receive multi-user single input multiple output (MU-SIMO) beamforming. The primary objective is to enhance the throughput by focusing on maximizing the sum-rate while also offering computationally efficient solution, in contrast to established conventional methods. We conduct experiments for several antenna configurations. Our experimental results demonstrate that NNBF exhibits superior performance compared to our baseline methods, namely, zero-forcing beamforming (ZFBF) and minimum mean square error (MMSE) equalizer. Additionally, NNBF is scalable to the number of single-antenna user equipments (UEs) while baseline methods have significant computational burden due to matrix pseudo-inverse operation.
    Astroconformer: The Prospects of Analyzing Stellar Light Curves with Transformer-Based Deep Learning Models. (arXiv:2309.16316v1 [astro-ph.SR])
    Light curves of stars encapsulate a wealth of information about stellar oscillations and granulation, thereby offering key insights into the internal structure and evolutionary state of stars. Conventional asteroseismic techniques have been largely confined to power spectral analysis, neglecting the valuable phase information contained within light curves. While recent machine learning applications in asteroseismology utilizing Convolutional Neural Networks (CNNs) have successfully inferred stellar attributes from light curves, they are often limited by the local feature extraction inherent in convolutional operations. To circumvent these constraints, we present $\textit{Astroconformer}$, a Transformer-based deep learning framework designed to capture long-range dependencies in stellar light curves. Our empirical analysis, which focuses on estimating surface gravity ($\log g$), is grounded in a carefully curated dataset derived from $\textit{Kepler}$ light curves. These light curves feature asteroseismic $\log g$ values spanning from 0.2 to 4.4. Our results underscore that, in the regime where the training data is abundant, $\textit{Astroconformer}$ attains a root-mean-square-error (RMSE) of 0.017 dex around $\log g \approx 3 $. Even in regions where training data are sparse, the RMSE can reach 0.1 dex. It outperforms not only the K-nearest neighbor-based model ($\textit{The SWAN}$) but also state-of-the-art CNNs. Ablation studies confirm that the efficacy of the models in this particular task is strongly influenced by the size of their receptive fields, with larger receptive fields correlating with enhanced performance. Moreover, we find that the attention mechanisms within $\textit{Astroconformer}$ are well-aligned with the inherent characteristics of stellar oscillations and granulation present in the light curves.
    Towards Best Practices of Activation Patching in Language Models: Metrics and Methods. (arXiv:2309.16042v1 [cs.LG])
    Mechanistic interpretability seeks to understand the internal mechanisms of machine learning models, where localization -- identifying the important model components -- is a key step. Activation patching, also known as causal tracing or interchange intervention, is a standard technique for this task (Vig et al., 2020), but the literature contains many variants with little consensus on the choice of hyperparameters or methodology. In this work, we systematically examine the impact of methodological details in activation patching, including evaluation metrics and corruption methods. In several settings of localization and circuit discovery in language models, we find that varying these hyperparameters could lead to disparate interpretability results. Backed by empirical observations, we give conceptual arguments for why certain metrics or methods may be preferred. Finally, we provide recommendations for the best practices of activation patching going forwards.
    GeoCLIP: Clip-Inspired Alignment between Locations and Images for Effective Worldwide Geo-localization. (arXiv:2309.16020v1 [cs.CV])
    Worldwide Geo-localization aims to pinpoint the precise location of images taken anywhere on Earth. This task has considerable challenges due to immense variation in geographic landscapes. The image-to-image retrieval-based approaches fail to solve this problem on a global scale as it is not feasible to construct a large gallery of images covering the entire world. Instead, existing approaches divide the globe into discrete geographic cells, transforming the problem into a classification task. However, their performance is limited by the predefined classes and often results in inaccurate localizations when an image's location significantly deviates from its class center. To overcome these limitations, we propose GeoCLIP, a novel CLIP-inspired Image-to-GPS retrieval approach that enforces alignment between the image and its corresponding GPS locations. GeoCLIP's location encoder models the Earth as a continuous function by employing positional encoding through random Fourier features and constructing a hierarchical representation that captures information at varying resolutions to yield a semantically rich high-dimensional feature suitable to use even beyond geo-localization. To the best of our knowledge, this is the first work employing GPS encoding for geo-localization. We demonstrate the efficacy of our method via extensive experiments and ablations on benchmark datasets. We achieve competitive performance with just 20% of training data, highlighting its effectiveness even in limited-data settings. Furthermore, we qualitatively demonstrate geo-localization using a text query by leveraging CLIP backbone of our image encoder.
    Correcting for heterogeneity in real-time epidemiological indicators. (arXiv:2309.16546v1 [cs.LG])
    Auxiliary data sources have become increasingly important in epidemiological surveillance, as they are often available at a finer spatial and temporal resolution, larger coverage, and lower latency than traditional surveillance signals. We describe the problem of spatial and temporal heterogeneity in these signals derived from these data sources, where spatial and/or temporal biases are present. We present a method to use a ``guiding'' signal to correct for these biases and produce a more reliable signal that can be used for modeling and forecasting. The method assumes that the heterogeneity can be approximated by a low-rank matrix and that the temporal heterogeneity is smooth over time. We also present a hyperparameter selection algorithm to choose the parameters representing the matrix rank and degree of temporal smoothness of the corrections. In the absence of ground truth, we use maps and plots to argue that this method does indeed reduce heterogeneity. Reducing heterogeneity from auxiliary data sources greatly increases their utility in modeling and forecasting epidemics.
    High-dimensional robust regression under heavy-tailed data: Asymptotics and Universality. (arXiv:2309.16476v1 [math.ST])
    We investigate the high-dimensional properties of robust regression estimators in the presence of heavy-tailed contamination of both the covariates and response functions. In particular, we provide a sharp asymptotic characterisation of M-estimators trained on a family of elliptical covariate and noise data distributions including cases where second and higher moments do not exist. We show that, despite being consistent, the Huber loss with optimally tuned location parameter $\delta$ is suboptimal in the high-dimensional regime in the presence of heavy-tailed noise, highlighting the necessity of further regularisation to achieve optimal performance. This result also uncovers the existence of a curious transition in $\delta$ as a function of the sample complexity and contamination. Moreover, we derive the decay rates for the excess risk of ridge regression. We show that, while it is both optimal and universal for noise distributions with finite second moment, its decay rate can be considerably faster when the covariates' second moment does not exist. Finally, we show that our formulas readily generalise to a richer family of models and data distributions, such as generalised linear estimation with arbitrary convex regularisation trained on mixture models.
    A Spectral Approach for Learning Spatiotemporal Neural Differential Equations. (arXiv:2309.16131v1 [cs.LG])
    Rapidly developing machine learning methods has stimulated research interest in computationally reconstructing differential equations (DEs) from observational data which may provide additional insight into underlying causative mechanisms. In this paper, we propose a novel neural-ODE based method that uses spectral expansions in space to learn spatiotemporal DEs. The major advantage of our spectral neural DE learning approach is that it does not rely on spatial discretization, thus allowing the target spatiotemporal equations to contain long range, nonlocal spatial interactions that act on unbounded spatial domains. Our spectral approach is shown to be as accurate as some of the latest machine learning approaches for learning PDEs operating on bounded domains. By developing a spectral framework for learning both PDEs and integro-differential equations, we extend machine learning methods to apply to unbounded DEs and a larger class of problems.
    AdvDiff: Generating Unrestricted Adversarial Examples using Diffusion Models. (arXiv:2307.12499v2 [cs.LG] UPDATED)
    Unrestricted adversarial attacks present a serious threat to deep learning models and adversarial defense techniques. They pose severe security problems for deep learning applications because they can effectively bypass defense mechanisms. However, previous attack methods often utilize Generative Adversarial Networks (GANs), which are not theoretically provable and thus generate unrealistic examples by incorporating adversarial objectives, especially for large-scale datasets like ImageNet. In this paper, we propose a new method, called AdvDiff, to generate unrestricted adversarial examples with diffusion models. We design two novel adversarial guidance techniques to conduct adversarial sampling in the reverse generation process of diffusion models. These two techniques are effective and stable to generate high-quality, realistic adversarial examples by integrating gradients of the target classifier interpretably. Experimental results on MNIST and ImageNet datasets demonstrate that AdvDiff is effective to generate unrestricted adversarial examples, which outperforms GAN-based methods in terms of attack performance and generation quality.
    Adversarial Examples Might be Avoidable: The Role of Data Concentration in Adversarial Robustness. (arXiv:2309.16096v1 [cs.LG])
    The susceptibility of modern machine learning classifiers to adversarial examples has motivated theoretical results suggesting that these might be unavoidable. However, these results can be too general to be applicable to natural data distributions. Indeed, humans are quite robust for tasks involving vision. This apparent conflict motivates a deeper dive into the question: Are adversarial examples truly unavoidable? In this work, we theoretically demonstrate that a key property of the data distribution -- concentration on small-volume subsets of the input space -- determines whether a robust classifier exists. We further demonstrate that, for a data distribution concentrated on a union of low-dimensional linear subspaces, exploiting data structure naturally leads to classifiers that enjoy good robustness guarantees, improving upon methods for provable certification in certain regimes.
    Infer and Adapt: Bipedal Locomotion Reward Learning from Demonstrations via Inverse Reinforcement Learning. (arXiv:2309.16074v1 [cs.RO])
    Enabling bipedal walking robots to learn how to maneuver over highly uneven, dynamically changing terrains is challenging due to the complexity of robot dynamics and interacted environments. Recent advancements in learning from demonstrations have shown promising results for robot learning in complex environments. While imitation learning of expert policies has been well-explored, the study of learning expert reward functions is largely under-explored in legged locomotion. This paper brings state-of-the-art Inverse Reinforcement Learning (IRL) techniques to solving bipedal locomotion problems over complex terrains. We propose algorithms for learning expert reward functions, and we subsequently analyze the learned functions. Through nonlinear function approximation, we uncover meaningful insights into the expert's locomotion strategies. Furthermore, we empirically demonstrate that training a bipedal locomotion policy with the inferred reward functions enhances its walking performance on unseen terrains, highlighting the adaptability offered by reward learning.
    VAE-based latent-space classification of RNO-G data. (arXiv:2309.16401v1 [astro-ph.HE])
    The Radio Neutrino Observatory in Greenland (RNO-G) is a radio-based ultra-high energy neutrino detector located at Summit Station, Greenland. It is still being constructed, with 7 stations currently operational. Neutrino detection works by measuring Askaryan radiation produced by neutrino-nucleon interactions. A neutrino candidate must be found amidst other backgrounds which are recorded at much higher rates -- including cosmic-rays and anthropogenic noise -- the origins of which are sometimes unknown. Here we describe a method to classify different noise classes using the latent space of a variational autoencoder. The latent space forms a compact representation that makes classification tractable. We analyze data from a noisy and a silent station. The method automatically detects and allows us to qualitatively separate multiple event classes, including physical wind-induced signals, for both the noisy and the quiet station.
    Contrastive Learning of Temporal Distinctiveness for Survival Analysis in Electronic Health Records. (arXiv:2308.13104v2 [cs.LG] UPDATED)
    Survival analysis plays a crucial role in many healthcare decisions, where the risk prediction for the events of interest can support an informative outlook for a patient's medical journey. Given the existence of data censoring, an effective way of survival analysis is to enforce the pairwise temporal concordance between censored and observed data, aiming to utilize the time interval before censoring as partially observed time-to-event labels for supervised learning. Although existing studies mostly employed ranking methods to pursue an ordering objective, contrastive methods which learn a discriminative embedding by having data contrast against each other, have not been explored thoroughly for survival analysis. Therefore, in this paper, we propose a novel Ontology-aware Temporality-based Contrastive Survival (OTCSurv) analysis framework that utilizes survival durations from both censored and observed data to define temporal distinctiveness and construct negative sample pairs with adjustable hardness for contrastive learning. Specifically, we first use an ontological encoder and a sequential self-attention encoder to represent the longitudinal EHR data with rich contexts. Second, we design a temporal contrastive loss to capture varying survival durations in a supervised setting through a hardness-aware negative sampling mechanism. Last, we incorporate the contrastive task into the time-to-event predictive task with multiple loss components. We conduct extensive experiments using a large EHR dataset to forecast the risk of hospitalized patients who are in danger of developing acute kidney injury (AKI), a critical and urgent medical condition. The effectiveness and explainability of the proposed model are validated through comprehensive quantitative and qualitative studies.
    Diverse and Aligned Audio-to-Video Generation via Text-to-Video Model Adaptation. (arXiv:2309.16429v1 [cs.LG])
    We consider the task of generating diverse and realistic videos guided by natural audio samples from a wide variety of semantic classes. For this task, the videos are required to be aligned both globally and temporally with the input audio: globally, the input audio is semantically associated with the entire output video, and temporally, each segment of the input audio is associated with a corresponding segment of that video. We utilize an existing text-conditioned video generation model and a pre-trained audio encoder model. The proposed method is based on a lightweight adaptor network, which learns to map the audio-based representation to the input representation expected by the text-to-video generation model. As such, it also enables video generation conditioned on text, audio, and, for the first time as far as we can ascertain, on both text and audio. We validate our method extensively on three datasets demonstrating significant semantic diversity of audio-video samples and further propose a novel evaluation metric (AV-Align) to assess the alignment of generated videos with input audio samples. AV-Align is based on the detection and comparison of energy peaks in both modalities. In comparison to recent state-of-the-art approaches, our method generates videos that are better aligned with the input sound, both with respect to content and temporal axis. We also show that videos produced by our method present higher visual quality and are more diverse.
    Graph-level Representation Learning with Joint-Embedding Predictive Architectures. (arXiv:2309.16014v1 [cs.LG])
    Joint-Embedding Predictive Architectures (JEPAs) have recently emerged as a novel and powerful technique for self-supervised representation learning. They aim to learn an energy-based model by predicting the latent representation of a target signal $y$ from a context signal $x$. JEPAs bypass the need for data augmentation and negative samples, which are typically required by contrastive learning, while avoiding the overfitting issues associated with generative-based pretraining. In this paper, we show that graph-level representations can be effectively modeled using this paradigm and propose Graph-JEPA, the first JEPA for the graph domain. In particular, we employ masked modeling to learn embeddings for different subgraphs of the input graph. To endow the representations with the implicit hierarchy that is often present in graph-level concepts, we devise an alternative training objective that consists of predicting the coordinates of the encoded subgraphs on the unit hyperbola in the 2D plane. Extensive validation shows that Graph-JEPA can learn representations that are expressive and competitive in both graph classification and regression problems.
    Causal Policy Gradient for Whole-Body Mobile Manipulation. (arXiv:2305.04866v4 [cs.RO] UPDATED)
    Developing the next generation of household robot helpers requires combining locomotion and interaction capabilities, which is generally referred to as mobile manipulation (MoMa). MoMa tasks are difficult due to the large action space of the robot and the common multi-objective nature of the task, e.g., efficiently reaching a goal while avoiding obstacles. Current approaches often segregate tasks into navigation without manipulation and stationary manipulation without locomotion by manually matching parts of the action space to MoMa sub-objectives (e.g. learning base actions for locomotion objectives and learning arm actions for manipulation). This solution prevents simultaneous combinations of locomotion and interaction degrees of freedom and requires human domain knowledge for both partitioning the action space and matching the action parts to the sub-objectives. In this paper, we introduce Causal MoMa, a new reinforcement learning framework to train policies for typical MoMa tasks that makes use of the most favorable subspace of the robot's action space to address each sub-objective. Causal MoMa automatically discovers the causal dependencies between actions and terms of the reward function and exploits these dependencies through causal policy gradient that reduces gradient variance compared to previous state-of-the-art reinforcement learning algorithms, improving convergence and results. We evaluate the performance of Causal MoMa on three types of simulated robots across different MoMa tasks and demonstrate success in transferring the policies trained in simulation directly to a real robot, where our agent is able to follow moving goals and react to dynamic obstacles while simultaneously and synergistically controlling the whole-body: base, arm, and head. More information at https://sites.google.com/view/causal-moma.
    Tiny Classifier Circuits: Evolving Accelerators for Tabular Data. (arXiv:2303.00031v2 [cs.AR] UPDATED)
    A typical machine learning (ML) development cycle for edge computing is to maximise the performance during model training and then minimise the memory/area footprint of the trained model for deployment on edge devices targeting CPUs, GPUs, microcontrollers, or custom hardware accelerators. This paper proposes a methodology for automatically generating predictor circuits for classification of tabular data with comparable prediction performance to conventional ML techniques while using substantially fewer hardware resources and power. The proposed methodology uses an evolutionary algorithm to search over the space of logic gates and automatically generates a classifier circuit with maximised training prediction accuracy. Classifier circuits are so tiny (i.e., consisting of no more than 300 logic gates) that they are called "Tiny Classifier" circuits, and can efficiently be implemented in ASIC or on an FPGA. We empirically evaluate the automatic Tiny Classifier circuit generation methodology or "Auto Tiny Classifiers" on a wide range of tabular datasets, and compare it against conventional ML techniques such as Amazon's AutoGluon, Google's TabNet and a neural search over Multi-Layer Perceptrons. Despite Tiny Classifiers being constrained to a few hundred logic gates, we observe no statistically significant difference in prediction performance in comparison to the best-performing ML baseline. When synthesised as a Silicon chip, Tiny Classifiers use 8-18x less area and 4-8x less power. When implemented as an ultra-low cost chip on a flexible substrate (i.e., FlexIC), they occupy 10-75x less area and consume 13-75x less power compared to the most hardware-efficient ML baseline. On an FPGA, Tiny Classifiers consume 3-11x fewer resources.
    General In-Hand Object Rotation with Vision and Touch. (arXiv:2309.09979v2 [cs.RO] UPDATED)
    We introduce RotateIt, a system that enables fingertip-based object rotation along multiple axes by leveraging multimodal sensory inputs. Our system is trained in simulation, where it has access to ground-truth object shapes and physical properties. Then we distill it to operate on realistic yet noisy simulated visuotactile and proprioceptive sensory inputs. These multimodal inputs are fused via a visuotactile transformer, enabling online inference of object shapes and physical properties during deployment. We show significant performance improvements over prior methods and the importance of visual and tactile sensing.
    Enhancing Speech Articulation Analysis using a Geometric Transformation of the X-ray Microbeam Dataset. (arXiv:2305.10775v3 [eess.AS] UPDATED)
    Accurate analysis of speech articulation is crucial for speech analysis. However, X-Y coordinates of articulators strongly depend on the anatomy of the speakers and the variability of pellet placements, and existing methods for mapping anatomical landmarks in the X-ray Microbeam Dataset (XRMB) fail to capture the entire anatomy of the vocal tract. In this paper, we propose a new geometric transformation that improves the accuracy of these measurements. Our transformation maps anatomical landmarks' X-Y coordinates along the midsagittal plane onto six relative measures: Lip Aperture (LA), Lip Protusion (LP), Tongue Body Constriction Location (TTCL), Degree (TBCD), Tongue Tip Constriction Location (TTCL) and Degree (TTCD). Our novel contribution is the extension of the palate trace towards the inferred anterior pharyngeal line, which improves measurements of tongue body constriction.
    Efficiency Separation between RL Methods: Model-Free, Model-Based and Goal-Conditioned. (arXiv:2309.16291v1 [cs.LG])
    We prove a fundamental limitation on the efficiency of a wide class of Reinforcement Learning (RL) algorithms. This limitation applies to model-free RL methods as well as a broad range of model-based methods, such as planning with tree search. Under an abstract definition of this class, we provide a family of RL problems for which these methods suffer a lower bound exponential in the horizon for their interactions with the environment to find an optimal behavior. However, there exists a method, not tailored to this specific family of problems, which can efficiently solve the problems in the family. In contrast, our limitation does not apply to several types of methods proposed in the literature, for instance, goal-conditioned methods or other algorithms that construct an inverse dynamics model.  ( 2 min )
    Feature Normalization Prevents Collapse of Non-contrastive Learning Dynamics. (arXiv:2309.16109v1 [cs.LG])
    Contrastive learning is a self-supervised representation learning framework, where two positive views generated through data augmentation are made similar by an attraction force in a data representation space, while a repulsive force makes them far from negative examples. Non-contrastive learning, represented by BYOL and SimSiam, further gets rid of negative examples and improves computational efficiency. While learned representations may collapse into a single point due to the lack of the repulsive force at first sight, Tian et al. (2021) revealed through the learning dynamics analysis that the representations can avoid collapse if data augmentation is sufficiently stronger than regularization. However, their analysis does not take into account commonly-used feature normalization, a normalizer before measuring the similarity of representations, and hence excessively strong regularization may collapse the dynamics, which is an unnatural behavior under the presence of feature normalization. Therefore, we extend the previous theory based on the L2 loss by considering the cosine loss, which involves feature normalization. We show that the cosine loss induces sixth-order dynamics (while the L2 loss induces a third-order one), in which a stable equilibrium dynamically emerges even if there are only collapsed solutions with given initial parameters. Thus, we offer a new understanding that feature normalization plays an important role in robustly preventing the dynamics collapse.
    Multi-Swap $k$-Means++. (arXiv:2309.16384v1 [cs.CG])
    The $k$-means++ algorithm of Arthur and Vassilvitskii (SODA 2007) is often the practitioners' choice algorithm for optimizing the popular $k$-means clustering objective and is known to give an $O(\log k)$-approximation in expectation. To obtain higher quality solutions, Lattanzi and Sohler (ICML 2019) proposed augmenting $k$-means++ with $O(k \log \log k)$ local search steps obtained through the $k$-means++ sampling distribution to yield a $c$-approximation to the $k$-means clustering problem, where $c$ is a large absolute constant. Here we generalize and extend their local search algorithm by considering larger and more sophisticated local search neighborhoods hence allowing to swap multiple centers at the same time. Our algorithm achieves a $9 + \varepsilon$ approximation ratio, which is the best possible for local search. Importantly we show that our approach yields substantial practical improvements, we show significant quality improvements over the approach of Lattanzi and Sohler (ICML 2019) on several datasets.
    A parsimonious, computationally efficient machine learning method for spatial regression. (arXiv:2309.16448v1 [stat.ML])
    We introduce the modified planar rotator method (MPRS), a physically inspired machine learning method for spatial/temporal regression. MPRS is a non-parametric model which incorporates spatial or temporal correlations via short-range, distance-dependent ``interactions'' without assuming a specific form for the underlying probability distribution. Predictions are obtained by means of a fully autonomous learning algorithm which employs equilibrium conditional Monte Carlo simulations. MPRS is able to handle scattered data and arbitrary spatial dimensions. We report tests on various synthetic and real-word data in one, two and three dimensions which demonstrate that the MPRS prediction performance (without parameter tuning) is competitive with standard interpolation methods such as ordinary kriging and inverse distance weighting. In particular, MPRS is a particularly effective gap-filling method for rough and non-Gaussian data (e.g., daily precipitation time series). MPRS shows superior computational efficiency and scalability for large samples. Massive data sets involving millions of nodes can be processed in a few seconds on a standard personal computer.
    A Metaheuristic for Amortized Search in High-Dimensional Parameter Spaces. (arXiv:2309.16465v1 [q-bio.QM])
    Parameter inference for dynamical models of (bio)physical systems remains a challenging problem. Intractable gradients, high-dimensional spaces, and non-linear model functions are typically problematic without large computational budgets. A recent body of work in that area has focused on Bayesian inference methods, which consider parameters under their statistical distributions and therefore, do not derive point estimates of optimal parameter values. Here we propose a new metaheuristic that drives dimensionality reductions from feature-informed transformations (DR-FFIT) to address these bottlenecks. DR-FFIT implements an efficient sampling strategy that facilitates a gradient-free parameter search in high-dimensional spaces. We use artificial neural networks to obtain differentiable proxies for the model's features of interest. The resulting gradients enable the estimation of a local active subspace of the model within a defined sampling region. This approach enables efficient dimensionality reductions of highly non-linear search spaces at a low computational cost. Our test data show that DR-FFIT boosts the performances of random-search and simulated-annealing against well-established metaheuristics, and improves the goodness-of-fit of the model, all within contained run-time costs.
    Exploring Self-Supervised Contrastive Learning of Spatial Sound Event Representation. (arXiv:2309.15938v1 [eess.AS])
    In this study, we present a simple multi-channel framework for contrastive learning (MC-SimCLR) to encode 'what' and 'where' of spatial audios. MC-SimCLR learns joint spectral and spatial representations from unlabeled spatial audios, thereby enhancing both event classification and sound localization in downstream tasks. At its core, we propose a multi-level data augmentation pipeline that augments different levels of audio features, including waveforms, Mel spectrograms, and generalized cross-correlation (GCC) features. In addition, we introduce simple yet effective channel-wise augmentation methods to randomly swap the order of the microphones and mask Mel and GCC channels. By using these augmentations, we find that linear layers on top of the learned representation significantly outperform supervised models in terms of both event classification accuracy and localization error. We also perform a comprehensive analysis of the effect of each augmentation method and a comparison of the fine-tuning performance using different amounts of labeled data.
    Learning to Transform for Generalizable Instance-wise Invariance. (arXiv:2309.16672v1 [cs.CV])
    Computer vision research has long aimed to build systems that are robust to spatial transformations found in natural data. Traditionally, this is done using data augmentation or hard-coding invariances into the architecture. However, too much or too little invariance can hurt, and the correct amount is unknown a priori and dependent on the instance. Ideally, the appropriate invariance would be learned from data and inferred at test-time. We treat invariance as a prediction problem. Given any image, we use a normalizing flow to predict a distribution over transformations and average the predictions over them. Since this distribution only depends on the instance, we can align instances before classifying them and generalize invariance across classes. The same distribution can also be used to adapt to out-of-distribution poses. This normalizing flow is trained end-to-end and can learn a much larger range of transformations than Augerino and InstaAug. When used as data augmentation, our method shows accuracy and robustness gains on CIFAR 10, CIFAR10-LT, and TinyImageNet.
    TraCE: Trajectory Counterfactual Explanation Scores. (arXiv:2309.15965v1 [cs.LG])
    Counterfactual explanations, and their associated algorithmic recourse, are typically leveraged to understand, explain, and potentially alter a prediction coming from a black-box classifier. In this paper, we propose to extend the use of counterfactuals to evaluate progress in sequential decision making tasks. To this end, we introduce a model-agnostic modular framework, TraCE (Trajectory Counterfactual Explanation) scores, which is able to distill and condense progress in highly complex scenarios into a single value. We demonstrate TraCE's utility across domains by showcasing its main properties in two case studies spanning healthcare and climate change.
    AnyMAL: An Efficient and Scalable Any-Modality Augmented Language Model. (arXiv:2309.16058v1 [cs.LG])
    We present Any-Modality Augmented Language Model (AnyMAL), a unified model that reasons over diverse input modality signals (i.e. text, image, video, audio, IMU motion sensor), and generates textual responses. AnyMAL inherits the powerful text-based reasoning abilities of the state-of-the-art LLMs including LLaMA-2 (70B), and converts modality-specific signals to the joint textual space through a pre-trained aligner module. To further strengthen the multimodal LLM's capabilities, we fine-tune the model with a multimodal instruction set manually collected to cover diverse topics and tasks beyond simple QAs. We conduct comprehensive empirical analysis comprising both human and automatic evaluations, and demonstrate state-of-the-art performance on various multimodal tasks.
    Uncertainty-Aware Decision Transformer for Stochastic Driving Environments. (arXiv:2309.16397v1 [cs.LG])
    Offline Reinforcement Learning (RL) has emerged as a promising framework for learning policies without active interactions, making it especially appealing for autonomous driving tasks. Recent successes of Transformers inspire casting offline RL as sequence modeling, which performs well in long-horizon tasks. However, they are overly optimistic in stochastic environments with incorrect assumptions that the same goal can be consistently achieved by identical actions. In this paper, we introduce an UNcertainty-awaRE deciSion Transformer (UNREST) for planning in stochastic driving environments without introducing additional transition or complex generative models. Specifically, UNREST estimates state uncertainties by the conditional mutual information between transitions and returns, and segments sequences accordingly. Discovering the `uncertainty accumulation' and `temporal locality' properties of driving environments, UNREST replaces the global returns in decision transformers with less uncertain truncated returns, to learn from true outcomes of agent actions rather than environment transitions. We also dynamically evaluate environmental uncertainty during inference for cautious planning. Extensive experimental results demonstrate UNREST's superior performance in various driving scenarios and the power of our uncertainty estimation strategy.
    Axiomatic Aggregations of Abductive Explanations. (arXiv:2109.03890v5 [cs.LG] UPDATED)
    The recent criticisms of the robustness of post hoc model approximation explanation methods (like LIME and SHAP) have led to the rise of model-precise abductive explanations. For each data point, abductive explanations provide a minimal subset of features that are sufficient to generate the outcome. While theoretically sound and rigorous, abductive explanations suffer from a major issue -- there can be several valid abductive explanations for the same data point. In such cases, providing a single abductive explanation can be insufficient; on the other hand, providing all valid abductive explanations can be incomprehensible due to their size. In this work, we solve this issue by aggregating the many possible abductive explanations into feature importance scores. We propose three aggregation methods: two based on power indices from cooperative game theory and a third based on a well-known measure of causal strength. We characterize these three methods axiomatically, showing that each of them uniquely satisfies a set of desirable properties. We also evaluate them on multiple datasets and show that these explanations are robust to the attacks that fool SHAP and LIME.
    Learning Dissipative Neural Dynamical Systems. (arXiv:2309.16032v1 [cs.LG])
    Consider an unknown nonlinear dynamical system that is known to be dissipative. The objective of this paper is to learn a neural dynamical model that approximates this system, while preserving the dissipativity property in the model. In general, imposing dissipativity constraints during neural network training is a hard problem for which no known techniques exist. In this work, we address the problem of learning a dissipative neural dynamical system model in two stages. First, we learn an unconstrained neural dynamical model that closely approximates the system dynamics. Next, we derive sufficient conditions to perturb the weights of the neural dynamical model to ensure dissipativity, followed by perturbation of the biases to retain the fit of the model to the trajectories of the nonlinear system. We show that these two perturbation problems can be solved independently to obtain a neural dynamical model that is guaranteed to be dissipative while closely approximating the nonlinear system.
    Augmenting LLMs with Knowledge: A survey on hallucination prevention. (arXiv:2309.16459v1 [cs.CL])
    Large pre-trained language models have demonstrated their proficiency in storing factual knowledge within their parameters and achieving remarkable results when fine-tuned for downstream natural language processing tasks. Nonetheless, their capacity to access and manipulate knowledge with precision remains constrained, resulting in performance disparities on knowledge-intensive tasks when compared to task-specific architectures. Additionally, the challenges of providing provenance for model decisions and maintaining up-to-date world knowledge persist as open research frontiers. To address these limitations, the integration of pre-trained models with differentiable access mechanisms to explicit non-parametric memory emerges as a promising solution. This survey delves into the realm of language models (LMs) augmented with the ability to tap into external knowledge sources, including external knowledge bases and search engines. While adhering to the standard objective of predicting missing tokens, these augmented LMs leverage diverse, possibly non-parametric external modules to augment their contextual processing capabilities, departing from the conventional language modeling paradigm. Through an exploration of current advancements in augmenting large language models with knowledge, this work concludes that this emerging research direction holds the potential to address prevalent issues in traditional LMs, such as hallucinations, un-grounded responses, and scalability challenges.
    A Design Toolbox for the Development of Collaborative Distributed Machine Learning Systems. (arXiv:2309.16584v1 [cs.MA])
    To leverage training data for the sufficient training of ML models from multiple parties in a confidentiality-preserving way, various collaborative distributed machine learning (CDML) system designs have been developed, for example, to perform assisted learning, federated learning, and split learning. CDML system designs show different traits, for example, high agent autonomy, machine learning (ML) model confidentiality, and fault tolerance. Facing a wide variety of CDML system designs with different traits, it is difficult for developers to design CDML systems with traits that match use case requirements in a targeted way. However, inappropriate CDML system designs may result in CDML systems failing their envisioned purposes. We developed a CDML design toolbox that can guide the development of CDML systems. Based on the CDML design toolbox, we present CDML system archetypes with distinct key traits that can support the design of CDML systems to meet use case requirements.
    Two-Step Active Learning for Instance Segmentation with Uncertainty and Diversity Sampling. (arXiv:2309.16139v1 [cs.CV])
    Training high-quality instance segmentation models requires an abundance of labeled images with instance masks and classifications, which is often expensive to procure. Active learning addresses this challenge by striving for optimum performance with minimal labeling cost by selecting the most informative and representative images for labeling. Despite its potential, active learning has been less explored in instance segmentation compared to other tasks like image classification, which require less labeling. In this study, we propose a post-hoc active learning algorithm that integrates uncertainty-based sampling with diversity-based sampling. Our proposed algorithm is not only simple and easy to implement, but it also delivers superior performance on various datasets. Its practical application is demonstrated on a real-world overhead imagery dataset, where it increases the labeling efficiency fivefold.
    Robust Offline Reinforcement Learning -- Certify the Confidence Interval. (arXiv:2309.16631v1 [cs.LG])
    Currently, reinforcement learning (RL), especially deep RL, has received more and more attention in the research area. However, the security of RL has been an obvious problem due to the attack manners becoming mature. In order to defend against such adversarial attacks, several practical approaches are developed, such as adversarial training, data filtering, etc. However, these methods are mostly based on empirical algorithms and experiments, without rigorous theoretical analysis of the robustness of the algorithms. In this paper, we develop an algorithm to certify the robustness of a given policy offline with random smoothing, which could be proven and conducted as efficiently as ones without random smoothing. Experiments on different environments confirm the correctness of our algorithm.
    Infinite Neural Network Quantum States: Entanglement and Training Dynamics. (arXiv:2112.00723v2 [quant-ph] UPDATED)
    We study infinite limits of neural network quantum states ($\infty$-NNQS), which exhibit representation power through ensemble statistics, and also tractable gradient descent dynamics. Ensemble averages of Renyi entropies are expressed in terms of neural network correlators, and architectures that exhibit volume-law entanglement are presented. A general framework is developed for studying the gradient descent dynamics of neural network quantum states (NNQS), using a quantum state neural tangent kernel (QS-NTK). For $\infty$-NNQS the training dynamics is simplified, since the QS-NTK becomes deterministic and constant. An analytic solution is derived for quantum state supervised learning, which allows an $\infty$-NNQS to recover any target wavefunction. Numerical experiments on finite and infinite NNQS in the transverse field Ising model and Fermi Hubbard model demonstrate excellent agreement with theory. $\infty$-NNQS opens up new opportunities for studying entanglement and training dynamics in other physics applications, such as in finding ground states.
    Navigating Healthcare Insights: A Birds Eye View of Explainability with Knowledge Graphs. (arXiv:2309.16593v1 [cs.AI])
    Knowledge graphs (KGs) are gaining prominence in Healthcare AI, especially in drug discovery and pharmaceutical research as they provide a structured way to integrate diverse information sources, enhancing AI system interpretability. This interpretability is crucial in healthcare, where trust and transparency matter, and eXplainable AI (XAI) supports decision making for healthcare professionals. This overview summarizes recent literature on the impact of KGs in healthcare and their role in developing explainable AI models. We cover KG workflow, including construction, relationship extraction, reasoning, and their applications in areas like Drug-Drug Interactions (DDI), Drug Target Interactions (DTI), Drug Development (DD), Adverse Drug Reactions (ADR), and bioinformatics. We emphasize the importance of making KGs more interpretable through knowledge-infused learning in healthcare. Finally, we highlight research challenges and provide insights for future directions.
    Depthwise Hyperparameter Transfer in Residual Networks: Dynamics and Scaling Limit. (arXiv:2309.16620v1 [stat.ML])
    The cost of hyperparameter tuning in deep learning has been rising with model sizes, prompting practitioners to find new tuning methods using a proxy of smaller networks. One such proposal uses $\mu$P parameterized networks, where the optimal hyperparameters for small width networks transfer to networks with arbitrarily large width. However, in this scheme, hyperparameters do not transfer across depths. As a remedy, we study residual networks with a residual branch scale of $1/\sqrt{\text{depth}}$ in combination with the $\mu$P parameterization. We provide experiments demonstrating that residual architectures including convolutional ResNets and Vision Transformers trained with this parameterization exhibit transfer of optimal hyperparameters across width and depth on CIFAR-10 and ImageNet. Furthermore, our empirical findings are supported and motivated by theory. Using recent developments in the dynamical mean field theory (DMFT) description of neural network learning dynamics, we show that this parameterization of ResNets admits a well-defined feature learning joint infinite-width and infinite-depth limit and show convergence of finite-size network dynamics towards this limit.
    AtomSurf : Surface Representation for Learning on Protein Structures. (arXiv:2309.16519v1 [cs.LG])
    Recent advancements in Cryo-EM and protein structure prediction algorithms have made large-scale protein structures accessible, paving the way for machine learning-based functional annotations.The field of geometric deep learning focuses on creating methods working on geometric data. An essential aspect of learning from protein structures is representing these structures as a geometric object (be it a grid, graph, or surface) and applying a learning method tailored to this representation. The performance of a given approach will then depend on both the representation and its corresponding learning method. In this paper, we investigate representing proteins as $\textit{3D mesh surfaces}$ and incorporate them into an established representation benchmark. Our first finding is that despite promising preliminary results, the surface representation alone does not seem competitive with 3D grids. Building on this, we introduce a synergistic approach, combining surface representations with graph-based methods, resulting in a general framework that incorporates both representations in learning. We show that using this combination, we are able to obtain state-of-the-art results across $\textit{all tested tasks}$. Our code and data can be found online: https://github.com/Vincentx15/atom2D .
    Nonlinear MPC design for incrementally ISS systems with application to GRU networks. (arXiv:2309.16428v1 [eess.SY])
    This brief addresses the design of a Nonlinear Model Predictive Control (NMPC) strategy for exponentially incremental Input-to-State Stable (ISS) systems. In particular, a novel formulation is devised, which does not necessitate the onerous computation of terminal ingredients, but rather relies on the explicit definition of a minimum prediction horizon ensuring closed-loop stability. The designed methodology is particularly suited for the control of systems learned by Recurrent Neural Networks (RNNs), which are known for their enhanced modeling capabilities and for which the incremental ISS properties can be studied thanks to simple algebraic conditions. The approach is applied to Gated Recurrent Unit (GRU) networks, providing also a method for the design of a tailored state observer with convergence guarantees. The resulting control architecture is tested on a benchmark system, demonstrating its good control performances and efficient applicability.
    Uncertainty Quantification for Eosinophil Segmentation. (arXiv:2309.16536v1 [eess.IV])
    Eosinophilic Esophagitis (EoE) is an allergic condition increasing in prevalence. To diagnose EoE, pathologists must find 15 or more eosinophils within a single high-power field (400X magnification). Determining whether or not a patient has EoE can be an arduous process and any medical imaging approaches used to assist diagnosis must consider both efficiency and precision. We propose an improvement of Adorno et al's approach for quantifying eosinphils using deep image segmentation. Our new approach leverages Monte Carlo Dropout, a common approach in deep learning to reduce overfitting, to provide uncertainty quantification on current deep learning models. The uncertainty can be visualized in an output image to evaluate model performance, provide insight to how deep learning algorithms function, and assist pathologists in identifying eosinophils.
    Systematic Sampling and Validation of Machine Learning-Parameterizations in Climate Models. (arXiv:2309.16177v1 [physics.ao-ph])
    Progress in hybrid physics-machine learning (ML) climate simulations has been limited by the difficulty of obtaining performant coupled (i.e. online) simulations. While evaluating hundreds of ML parameterizations of subgrid closures (here of convection and radiation) offline is straightforward, online evaluation at the same scale is technically challenging. Our software automation achieves an order-of-magnitude larger sampling of online modeling errors than has previously been examined. Using this, we evaluate the hybrid climate model performance and define strategies to improve it. We show that model online performance improves when incorporating memory, a relative humidity input feature transformation, and additional input variables. We also reveal substantial variation in online error and inconsistencies between offline vs. online error statistics. The implication is that hundreds of candidate ML models should be evaluated online to detect the effects of parameterization design choices. This is considerably more sampling than tends to be reported in the current literature.
    Deep Single Models vs. Ensembles: Insights for a Fast Deployment of Parking Monitoring Systems. (arXiv:2309.16495v1 [cs.CV])
    Searching for available parking spots in high-density urban centers is a stressful task for drivers that can be mitigated by systems that know in advance the nearest parking space available. To this end, image-based systems offer cost advantages over other sensor-based alternatives (e.g., ultrasonic sensors), requiring less physical infrastructure for installation and maintenance. Despite recent deep learning advances, deploying intelligent parking monitoring is still a challenge since most approaches involve collecting and labeling large amounts of data, which is laborious and time-consuming. Our study aims to uncover the challenges in creating a global framework, trained using publicly available labeled parking lot images, that performs accurately across diverse scenarios, enabling the parking space monitoring as a ready-to-use system to deploy in a new environment. Through exhaustive experiments involving different datasets and deep learning architectures, including fusion strategies and ensemble methods, we found that models trained on diverse datasets can achieve 95\% accuracy without the burden of data annotation and model training on the target parking lot
    Dynamic Selection in Algorithmic Decision-making. (arXiv:2108.12547v3 [econ.EM] UPDATED)
    This paper identifies and addresses dynamic selection problems in online learning algorithms with endogenous data. In a contextual multi-armed bandit model, a novel bias (self-fulfilling bias) arises because the endogeneity of the data influences the choices of decisions, affecting the distribution of future data to be collected and analyzed. We propose an instrumental-variable-based algorithm to correct for the bias. It obtains true parameter values and attains low (logarithmic-like) regret levels. We also prove a central limit theorem for statistical inference. To establish the theoretical properties, we develop a general technique that untangles the interdependence between data and actions.
    Voting Network for Contour Levee Farmland Segmentation and Classification. (arXiv:2309.16561v1 [cs.CV])
    High-resolution aerial imagery allows fine details in the segmentation of farmlands. However, small objects and features introduce distortions to the delineation of object boundaries, and larger contextual views are needed to mitigate class confusion. In this work, we present an end-to-end trainable network for segmenting farmlands with contour levees from high-resolution aerial imagery. A fusion block is devised that includes multiple voting blocks to achieve image segmentation and classification. We integrate the fusion block with a backbone and produce both semantic predictions and segmentation slices. The segmentation slices are used to perform majority voting on the predictions. The network is trained to assign the most likely class label of a segment to its pixels, learning the concept of farmlands rather than analyzing constitutive pixels separately. We evaluate our method using images from the National Agriculture Imagery Program. Our method achieved an average accuracy of 94.34\%. Compared to the state-of-the-art methods, the proposed method obtains an improvement of 6.96% and 2.63% in the F1 score on average.
    Selective Nonparametric Regression via Testing. (arXiv:2309.16412v1 [stat.ML])
    Prediction with the possibility of abstention (or selective prediction) is an important problem for error-critical machine learning applications. While well-studied in the classification setup, selective approaches to regression are much less developed. In this work, we consider the nonparametric heteroskedastic regression problem and develop an abstention procedure via testing the hypothesis on the value of the conditional variance at a given point. Unlike existing methods, the proposed one allows to account not only for the value of the variance itself but also for the uncertainty of the corresponding variance predictor. We prove non-asymptotic bounds on the risk of the resulting estimator and show the existence of several different convergence regimes. Theoretical analysis is illustrated with a series of experiments on simulated and real-world data.
    Jointly Training Large Autoregressive Multimodal Models. (arXiv:2309.15564v2 [cs.LG] UPDATED)
    In recent years, advances in the large-scale pretraining of language and text-to-image models have revolutionized the field of machine learning. Yet, integrating these two modalities into a single, robust model capable of generating seamless multimodal outputs remains a significant challenge. To address this gap, we present the Joint Autoregressive Mixture (JAM) framework, a modular approach that systematically fuses existing text and image generation models. We also introduce a specialized, data-efficient instruction-tuning strategy, tailored for mixed-modal generation tasks. Our final instruct-tuned model demonstrates unparalleled performance in generating high-quality multimodal outputs and represents the first model explicitly designed for this purpose.
    CasIL: Cognizing and Imitating Skills via a Dual Cognition-Action Architecture. (arXiv:2309.16299v1 [cs.RO])
    Enabling robots to effectively imitate expert skills in longhorizon tasks such as locomotion, manipulation, and more, poses a long-standing challenge. Existing imitation learning (IL) approaches for robots still grapple with sub-optimal performance in complex tasks. In this paper, we consider how this challenge can be addressed within the human cognitive priors. Heuristically, we extend the usual notion of action to a dual Cognition (high-level)-Action (low-level) architecture by introducing intuitive human cognitive priors, and propose a novel skill IL framework through human-robot interaction, called Cognition-Action-based Skill Imitation Learning (CasIL), for the robotic agent to effectively cognize and imitate the critical skills from raw visual demonstrations. CasIL enables both cognition and action imitation, while high-level skill cognition explicitly guides low-level primitive actions, providing robustness and reliability to the entire skill IL process. We evaluated our method on MuJoCo and RLBench benchmarks, as well as on the obstacle avoidance and point-goal navigation tasks for quadrupedal robot locomotion. Experimental results show that our CasIL consistently achieves competitive and robust skill imitation capability compared to other counterparts in a variety of long-horizon robotic tasks.
    Method and Validation for Optimal Lineup Creation for Daily Fantasy Football Using Machine Learning and Linear Programming. (arXiv:2309.15253v2 [cs.LG] UPDATED)
    Daily fantasy sports (DFS) are weekly or daily online contests where real-game performances of individual players are converted to fantasy points (FPTS). Users select players for their lineup to maximize their FPTS within a set player salary cap. This paper focuses on (1) the development of a method to forecast NFL player performance under uncertainty and (2) determining an optimal lineup to maximize FPTS under a set salary limit. A supervised learning neural network was created and used to project FPTS based on past player performance (2018 NFL regular season for this work) prior to the upcoming week. These projected FPTS were used in a mixed integer linear program to find the optimal lineup. The performance of resultant lineups was compared to randomly-created lineups. On average, the optimal lineups outperformed the random lineups. The generated lineups were then compared to real-world lineups from users on DraftKings. The generated lineups generally fell in approximately the 31st percentile (median). The FPTS methods and predictions presented here can be further improved using this study as a baseline comparison.
    Leveraging Pre-trained Language Models for Time Interval Prediction in Text-Enhanced Temporal Knowledge Graphs. (arXiv:2309.16357v1 [cs.LG])
    Most knowledge graph completion (KGC) methods learn latent representations of entities and relations of a given graph by mapping them into a vector space. Although the majority of these methods focus on static knowledge graphs, a large number of publicly available KGs contain temporal information stating the time instant/period over which a certain fact has been true. Such graphs are often known as temporal knowledge graphs. Furthermore, knowledge graphs may also contain textual descriptions of entities and relations. Both temporal information and textual descriptions are not taken into account during representation learning by static KGC methods, and only structural information of the graph is leveraged. Recently, some studies have used temporal information to improve link prediction, yet they do not exploit textual descriptions and do not support inductive inference (prediction on entities that have not been seen in training). We propose a novel framework called TEMT that exploits the power of pre-trained language models (PLMs) for text-enhanced temporal knowledge graph completion. The knowledge stored in the parameters of a PLM allows TEMT to produce rich semantic representations of facts and to generalize on previously unseen entities. TEMT leverages textual and temporal information available in a KG, treats them separately, and fuses them to get plausibility scores of facts. Unlike previous approaches, TEMT effectively captures dependencies across different time points and enables predictions on unseen entities. To assess the performance of TEMT, we carried out several experiments including time interval prediction, both in transductive and inductive settings, and triple classification. The experimental results show that TEMT is competitive with the state-of-the-art.
    HyperBO+: Pre-training a universal prior for Bayesian optimization with hierarchical Gaussian processes. (arXiv:2212.10538v2 [cs.LG] UPDATED)
    Bayesian optimization (BO), while proved highly effective for many black-box function optimization tasks, requires practitioners to carefully select priors that well model their functions of interest. Rather than specifying by hand, researchers have investigated transfer learning based methods to automatically learn the priors, e.g. multi-task BO (Swersky et al., 2013), few-shot BO (Wistuba and Grabocka, 2021) and HyperBO (Wang et al., 2022). However, those prior learning methods typically assume that the input domains are the same for all tasks, weakening their ability to use observations on functions with different domains or generalize the learned priors to BO on different search spaces. In this work, we present HyperBO+: a pre-training approach for hierarchical Gaussian processes that enables the same prior to work universally for Bayesian optimization on functions with different domains. We propose a two-step pre-training method and analyze its appealing asymptotic properties and benefits to BO both theoretically and empirically. On real-world hyperparameter tuning tasks that involve multiple search spaces, we demonstrate that HyperBO+ is able to generalize to unseen search spaces and achieves lower regrets than competitive baselines.
    Quantum Self-Attention Neural Networks for Text Classification. (arXiv:2205.05625v2 [quant-ph] UPDATED)
    An emerging direction of quantum computing is to establish meaningful quantum applications in various fields of artificial intelligence, including natural language processing (NLP). Although some efforts based on syntactic analysis have opened the door to research in Quantum NLP (QNLP), limitations such as heavy syntactic preprocessing and syntax-dependent network architecture make them impracticable on larger and real-world data sets. In this paper, we propose a new simple network architecture, called the quantum self-attention neural network (QSANN), which can compensate for these limitations. Specifically, we introduce the self-attention mechanism into quantum neural networks and then utilize a Gaussian projected quantum self-attention serving as a sensible quantum version of self-attention. As a result, QSANN is effective and scalable on larger data sets and has the desirable property of being implementable on near-term quantum devices. In particular, our QSANN outperforms the best existing QNLP model based on syntactic analysis as well as a simple classical self-attention neural network in numerical experiments of text classification tasks on public data sets. We further show that our method exhibits robustness to low-level quantum noises and showcases resilience to quantum neural network architectures.
    Augment to Interpret: Unsupervised and Inherently Interpretable Graph Embeddings. (arXiv:2309.16564v1 [cs.LG])
    Unsupervised learning allows us to leverage unlabelled data, which has become abundantly available, and to create embeddings that are usable on a variety of downstream tasks. However, the typical lack of interpretability of unsupervised representation learning has become a limiting factor with regard to recent transparent-AI regulations. In this paper, we study graph representation learning and we show that data augmentation that preserves semantics can be learned and used to produce interpretations. Our framework, which we named INGENIOUS, creates inherently interpretable embeddings and eliminates the need for costly additional post-hoc analysis. We also introduce additional metrics addressing the lack of formalism and metrics in the understudied area of unsupervised-representation learning interpretability. Our results are supported by an experimental study applied to both graph-level and node-level tasks and show that interpretable embeddings provide state-of-the-art performance on subsequent downstream tasks.
    Unsupervised Fact Verification by Language Model Distillation. (arXiv:2309.16540v1 [cs.CL])
    Unsupervised fact verification aims to verify a claim using evidence from a trustworthy knowledge base without any kind of data annotation. To address this challenge, algorithms must produce features for every claim that are both semantically meaningful, and compact enough to find a semantic alignment with the source information. In contrast to previous work, which tackled the alignment problem by learning over annotated corpora of claims and their corresponding labels, we propose SFAVEL (Self-supervised Fact Verification via Language Model Distillation), a novel unsupervised framework that leverages pre-trained language models to distil self-supervised features into high-quality claim-fact alignments without the need for annotations. This is enabled by a novel contrastive loss function that encourages features to attain high-quality claim and evidence alignments whilst preserving the semantic relationships across the corpora. Notably, we present results that achieve a new state-of-the-art on the standard FEVER fact verification benchmark (+8% accuracy) with linear evaluation.
    Reusability report: Prostate cancer stratification with diverse biologically-informed neural architectures. (arXiv:2309.16645v1 [cs.LG])
    In, Elmarakeby et al., "Biologically informed deep neural network for prostate cancer discovery", a feedforward neural network with biologically informed, sparse connections (P-NET) was presented to model the state of prostate cancer. We verified the reproducibility of the study conducted by Elmarakeby et al., using both their original codebase, and our own re-implementation using more up-to-date libraries. We quantified the contribution of network sparsification by Reactome biological pathways, and confirmed its importance to P-NET's superior performance. Furthermore, we explored alternative neural architectures and approaches to incorporating biological information into the networks. We experimented with three types of graph neural networks on the same training data, and investigated the clinical prediction agreement between different models. Our analyses demonstrated that deep neural networks with distinct architectures make incorrect predictions for individual patient that are persistent across different initializations of a specific neural architecture. This suggests that different neural architectures are sensitive to different aspects of the data, an important yet under-explored challenge for clinical prediction tasks.
    Using Weak Supervision and Data Augmentation in Question Answering. (arXiv:2309.16175v1 [cs.CL])
    The onset of the COVID-19 pandemic accentuated the need for access to biomedical literature to answer timely and disease-specific questions. During the early days of the pandemic, one of the biggest challenges we faced was the lack of peer-reviewed biomedical articles on COVID-19 that could be used to train machine learning models for question answering (QA). In this paper, we explore the roles weak supervision and data augmentation play in training deep neural network QA models. First, we investigate whether labels generated automatically from the structured abstracts of scholarly papers using an information retrieval algorithm, BM25, provide a weak supervision signal to train an extractive QA model. We also curate new QA pairs using information retrieval techniques, guided by the clinicaltrials.gov schema and the structured abstracts of articles, in the absence of annotated data from biomedical domain experts. Furthermore, we explore augmenting the training data of a deep neural network model with linguistic features from external sources such as lexical databases to account for variations in word morphology and meaning. To better utilize our training data, we apply curriculum learning to domain adaptation, fine-tuning our QA model in stages based on characteristics of the QA pairs. We evaluate our methods in the context of QA models at the core of a system to answer questions about COVID-19.
    End-to-end Risk Prediction of Atrial Fibrillation from the 12-Lead ECG by Deep Neural Networks. (arXiv:2309.16335v1 [cs.LG])
    Background: Atrial fibrillation (AF) is one of the most common cardiac arrhythmias that affects millions of people each year worldwide and it is closely linked to increased risk of cardiovascular diseases such as stroke and heart failure. Machine learning methods have shown promising results in evaluating the risk of developing atrial fibrillation from the electrocardiogram. We aim to develop and evaluate one such algorithm on a large CODE dataset collected in Brazil. Results: The deep neural network model identified patients without indication of AF in the presented ECG but who will develop AF in the future with an AUC score of 0.845. From our survival model, we obtain that patients in the high-risk group (i.e. with the probability of a future AF case being greater than 0.7) are 50% more likely to develop AF within 40 weeks, while patients belonging to the minimal-risk group (i.e. with the probability of a future AF case being less than or equal to 0.1) have more than 85% chance of remaining AF free up until after seven years. Conclusion: We developed and validated a model for AF risk prediction. If applied in clinical practice, the model possesses the potential of providing valuable and useful information in decision-making and patient management processes.
    Universal Sleep Decoder: Aligning awake and sleep neural representation across subjects. (arXiv:2309.16457v1 [cs.LG])
    Decoding memory content from brain activity during sleep has long been a goal in neuroscience. While spontaneous reactivation of memories during sleep in rodents is known to support memory consolidation and offline learning, capturing memory replay in humans is challenging due to the absence of well-annotated sleep datasets and the substantial differences in neural patterns between wakefulness and sleep. To address these challenges, we designed a novel cognitive neuroscience experiment and collected a comprehensive, well-annotated electroencephalography (EEG) dataset from 52 subjects during both wakefulness and sleep. Leveraging this benchmark dataset, we developed the Universal Sleep Decoder (USD) to align neural representations between wakefulness and sleep across subjects. Our model achieves up to 16.6% top-1 zero-shot accuracy on unseen subjects, comparable to decoding performances using individual sleep data. Furthermore, fine-tuning USD on test subjects enhances decoding accuracy to 25.9% top-1 accuracy, a substantial improvement over the baseline chance of 6.7%. Model comparison and ablation analyses reveal that our design choices, including the use of (i) an additional contrastive objective to integrate awake and sleep neural signals and (ii) the pretrain-finetune paradigm to incorporate different subjects, significantly contribute to these performances. Collectively, our findings and methodologies represent a significant advancement in the field of sleep decoding.
    Comparing Active Learning Performance Driven by Gaussian Processes or Bayesian Neural Networks for Constrained Trajectory Exploration. (arXiv:2309.16114v1 [cs.RO])
    Robots with increasing autonomy progress our space exploration capabilities, particularly for in-situ exploration and sampling to stand in for human explorers. Currently, humans drive robots to meet scientific objectives, but depending on the robot's location, the exchange of information and driving commands between the human operator and robot may cause undue delays in mission fulfillment. An autonomous robot encoded with a scientific objective and an exploration strategy incurs no communication delays and can fulfill missions more quickly. Active learning algorithms offer this capability of intelligent exploration, but the underlying model structure varies the performance of the active learning algorithm in accurately forming an understanding of the environment. In this paper, we investigate the performance differences between active learning algorithms driven by Gaussian processes or Bayesian neural networks for exploration strategies encoded on agents that are constrained in their trajectories, like planetary surface rovers. These two active learning strategies were tested in a simulation environment against science-blind strategies to predict the spatial distribution of a variable of interest along multiple datasets. The performance metrics of interest are model accuracy in root mean squared (RMS) error, training time, model convergence, total distance traveled until convergence, and total samples until convergence. Active learning strategies encoded with Gaussian processes require less computation to train, converge to an accurate model more quickly, and propose trajectories of shorter distance, except in a few complex environments in which Bayesian neural networks achieve a more accurate model in the large data regime due to their more expressive functional bases. The paper concludes with advice on when and how to implement either exploration strategy for future space missions.
    Review of Machine Learning Methods for Additive Manufacturing of Functionally Graded Materials. (arXiv:2309.16571v1 [cs.LG])
    Additive manufacturing has revolutionized the manufacturing of complex parts by enabling direct material joining and offers several advantages such as cost-effective manufacturing of complex parts, reducing manufacturing waste, and opening new possibilities for manufacturing automation. One group of materials for which additive manufacturing holds great potential for enhancing component performance and properties is Functionally Graded Materials (FGMs). FGMs are advanced composite materials that exhibit smoothly varying properties making them desirable for applications in aerospace, automobile, biomedical, and defense industries. Such composition differs from traditional composite materials, since the location-dependent composition changes gradually in FGMs, leading to enhanced properties. Recently, machine learning techniques have emerged as a promising means for fabrication of FGMs through optimizing processing parameters, improving product quality, and detecting manufacturing defects. This paper first provides a brief literature review of works related to FGM fabrication, followed by reviewing works on employing machine learning in additive manufacturing, Afterward, we provide an overview of published works in the literature related to the application of machine learning methods in Directed Energy Deposition and for fabrication of FGMs.
    EFFL: Egalitarian Fairness in Federated Learning for Mitigating Matthew Effect. (arXiv:2309.16338v1 [cs.LG])
    Recent advances in federated learning (FL) enable collaborative training of machine learning (ML) models from large-scale and widely dispersed clients while protecting their privacy. However, when different clients' datasets are heterogeneous, traditional FL mechanisms produce a global model that does not adequately represent the poorer clients with limited data resources, resulting in lower accuracy and higher bias on their local data. According to the Matthew effect, which describes how the advantaged gain more advantage and the disadvantaged lose more over time, deploying such a global model in client applications may worsen the resource disparity among the clients and harm the principles of social welfare and fairness. To mitigate the Matthew effect, we propose Egalitarian Fairness Federated Learning (EFFL), where egalitarian fairness refers to the global model learned from FL has: (1) equal accuracy among clients; (2) equal decision bias among clients. Besides achieving egalitarian fairness among the clients, EFFL also aims for performance optimality, minimizing the empirical risk loss and the bias for each client; both are essential for any ML model training, whether centralized or decentralized. We formulate EFFL as a constrained multi-constrained multi-objectives optimization (MCMOO) problem, with the decision bias and egalitarian fairness as constraints and the minimization of the empirical risk losses on all clients as multiple objectives to be optimized. We propose a gradient-based three-stage algorithm to obtain the Pareto optimal solutions within the constraint space. Extensive experiments demonstrate that EFFL outperforms other state-of-the-art FL algorithms in achieving a high-performance global model with enhanced egalitarian fairness among all clients.
    Task-Oriented Koopman-Based Control with Contrastive Encoder. (arXiv:2309.16077v1 [cs.RO])
    We present task-oriented Koopman-based control that utilizes end-to-end reinforcement learning and contrastive encoder to simultaneously learn the Koopman latent embedding, operator and associated linear controller within an iterative loop. By prioritizing the task cost as main objective for controller learning, we reduce the reliance of controller design on a well-identified model, which extends Koopman control beyond low-dimensional systems to high-dimensional, complex nonlinear systems, including pixel-based scenarios.
    Unmasking the Chameleons: A Benchmark for Out-of-Distribution Detection in Medical Tabular Data. (arXiv:2309.16220v1 [cs.LG])
    Despite their success, Machine Learning (ML) models do not generalize effectively to data not originating from the training distribution. To reliably employ ML models in real-world healthcare systems and avoid inaccurate predictions on out-of-distribution (OOD) data, it is crucial to detect OOD samples. Numerous OOD detection approaches have been suggested in other fields - especially in computer vision - but it remains unclear whether the challenge is resolved when dealing with medical tabular data. To answer this pressing need, we propose an extensive reproducible benchmark to compare different methods across a suite of tests including both near and far OODs. Our benchmark leverages the latest versions of eICU and MIMIC-IV, two public datasets encompassing tens of thousands of ICU patients in several hospitals. We consider a wide array of density-based methods and SOTA post-hoc detectors across diverse predictive architectures, including MLP, ResNet, and Transformer. Our findings show that i) the problem appears to be solved for far-OODs, but remains open for near-OODs; ii) post-hoc methods alone perform poorly, but improve substantially when coupled with distance-based mechanisms; iii) the transformer architecture is far less overconfident compared to MLP and ResNet.
    Compositional Sculpting of Iterative Generative Processes. (arXiv:2309.16115v1 [cs.LG])
    High training costs of generative models and the need to fine-tune them for specific tasks have created a strong interest in model reuse and composition. A key challenge in composing iterative generative processes, such as GFlowNets and diffusion models, is that to realize the desired target distribution, all steps of the generative process need to be coordinated, and satisfy delicate balance conditions. In this work, we propose Compositional Sculpting: a general approach for defining compositions of iterative generative processes. We then introduce a method for sampling from these compositions built on classifier guidance. We showcase ways to accomplish compositional sculpting in both GFlowNets and diffusion models. We highlight two binary operations $\unicode{x2014}$ the harmonic mean ($p_1 \otimes p_2$) and the contrast ($p_1 \unicode{x25D1}\,p_2$) between pairs, and the generalization of these operations to multiple component distributions. We offer empirical results on image and molecular generation tasks.  ( 2 min )
    Differential 2D Copula Approximating Transforms via Sobolev Training: 2-Cats Networks. (arXiv:2309.16391v1 [cs.LG])
    Copulas are a powerful statistical tool that captures dependencies across data dimensions. When applying Copulas, we can estimate multivariate distribution functions by initially estimating independent marginals, an easy task, and then a single copulating function, $C$, to connect the marginals, a hard task. For two-dimensional data, a copula is a two-increasing function of the form $C: (u,v)\in \mathbf{I}^2 \rightarrow \mathbf{I}$, where $\mathbf{I} = [0, 1]$. In this paper, we show how Neural Networks (NNs) can approximate any two-dimensional copula non-parametrically. Our approach, denoted as 2-Cats, is inspired by the Physics-Informed Neural Networks and Sobolev Training literature. Not only do we show that we can estimate the output of a 2d Copula better than the state-of-the-art, our approach is non-parametric and respects the mathematical properties of a Copula $C$.
    LagrangeBench: A Lagrangian Fluid Mechanics Benchmarking Suite. (arXiv:2309.16342v1 [cs.LG])
    Machine learning has been successfully applied to grid-based PDE modeling in various scientific applications. However, learned PDE solvers based on Lagrangian particle discretizations, which are the preferred approach to problems with free surfaces or complex physics, remain largely unexplored. We present LagrangeBench, the first benchmarking suite for Lagrangian particle problems, focusing on temporal coarse-graining. In particular, our contribution is: (a) seven new fluid mechanics datasets (four in 2D and three in 3D) generated with the Smoothed Particle Hydrodynamics (SPH) method including the Taylor-Green vortex, lid-driven cavity, reverse Poiseuille flow, and dam break, each of which includes different physics like solid wall interactions or free surface, (b) efficient JAX-based API with various recent training strategies and neighbors search routine, and (c) JAX implementation of established Graph Neural Networks (GNNs) like GNS and SEGNN with baseline results. Finally, to measure the performance of learned surrogates we go beyond established position errors and introduce physical metrics like kinetic energy MSE and Sinkhorn distance for the particle distribution. Our codebase is available under the URL: https://github.com/tumaer/lagrangebench
    Transfer Learning for Bayesian Optimization on Heterogeneous Search Spaces. (arXiv:2309.16597v1 [cs.LG])
    Bayesian optimization (BO) is a popular black-box function optimization method, which makes sequential decisions based on a Bayesian model, typically a Gaussian process (GP), of the function. To ensure the quality of the model, transfer learning approaches have been developed to automatically design GP priors by learning from observations on "training" functions. These training functions are typically required to have the same domain as the "test" function (black-box function to be optimized). In this paper, we introduce MPHD, a model pre-training method on heterogeneous domains, which uses a neural net mapping from domain-specific contexts to specifications of hierarchical GPs. MPHD can be seamlessly integrated with BO to transfer knowledge across heterogeneous search spaces. Our theoretical and empirical results demonstrate the validity of MPHD and its superior performance on challenging black-box function optimization tasks.
    Beyond Reverse KL: Generalizing Direct Preference Optimization with Diverse Divergence Constraints. (arXiv:2309.16240v1 [cs.LG])
    The increasing capabilities of large language models (LLMs) raise opportunities for artificial general intelligence but concurrently amplify safety concerns, such as potential misuse of AI systems, necessitating effective AI alignment. Reinforcement Learning from Human Feedback (RLHF) has emerged as a promising pathway towards AI alignment but brings forth challenges due to its complexity and dependence on a separate reward model. Direct Preference Optimization (DPO) has been proposed as an alternative, and it remains equivalent to RLHF under the reverse KL regularization constraint. This paper presents $f$-DPO, a generalized approach to DPO by incorporating diverse divergence constraints. We show that under certain $f$-divergences, including Jensen-Shannon divergence, forward KL divergences and $\alpha$-divergences, the complex relationship between the reward and optimal policy can also be simplified by addressing the Karush-Kuhn-Tucker conditions. This eliminates the need for estimating the normalizing constant in the Bradley-Terry model and enables a tractable mapping between the reward function and the optimal policy. Our approach optimizes LLMs to align with human preferences in a more efficient and supervised manner under a broad set of divergence constraints. Empirically, adopting these divergences ensures a balance between alignment performance and generation diversity. Importantly, $f$-DPO outperforms PPO-based methods in divergence efficiency, and divergence constraints directly influence expected calibration error (ECE).
    Recent Advances of Differential Privacy in Centralized Deep Learning: A Systematic Survey. (arXiv:2309.16398v1 [cs.LG])
    Differential Privacy has become a widely popular method for data protection in machine learning, especially since it allows formulating strict mathematical privacy guarantees. This survey provides an overview of the state-of-the-art of differentially private centralized deep learning, thorough analyses of recent advances and open problems, as well as a discussion of potential future developments in the field. Based on a systematic literature review, the following topics are addressed: auditing and evaluation methods for private models, improvements of privacy-utility trade-offs, protection against a broad range of threats and attacks, differentially private generative models, and emerging application domains.
    ShapeDBA: Generating Effective Time Series Prototypes using ShapeDTW Barycenter Averaging. (arXiv:2309.16353v1 [cs.LG])
    Time series data can be found in almost every domain, ranging from the medical field to manufacturing and wireless communication. Generating realistic and useful exemplars and prototypes is a fundamental data analysis task. In this paper, we investigate a novel approach to generating realistic and useful exemplars and prototypes for time series data. Our approach uses a new form of time series average, the ShapeDTW Barycentric Average. We therefore turn our attention to accurately generating time series prototypes with a novel approach. The existing time series prototyping approaches rely on the Dynamic Time Warping (DTW) similarity measure such as DTW Barycentering Average (DBA) and SoftDBA. These last approaches suffer from a common problem of generating out-of-distribution artifacts in their prototypes. This is mostly caused by the DTW variant used and its incapability of detecting neighborhood similarities, instead it detects absolute similarities. Our proposed method, ShapeDBA, uses the ShapeDTW variant of DTW, that overcomes this issue. We chose time series clustering, a popular form of time series analysis to evaluate the outcome of ShapeDBA compared to the other prototyping approaches. Coupled with the k-means clustering algorithm, and evaluated on a total of 123 datasets from the UCR archive, our proposed averaging approach is able to achieve new state-of-the-art results in terms of Adjusted Rand Index.
    A Primer on Bayesian Neural Networks: Review and Debates. (arXiv:2309.16314v1 [stat.ML])
    Neural networks have achieved remarkable performance across various problem domains, but their widespread applicability is hindered by inherent limitations such as overconfidence in predictions, lack of interpretability, and vulnerability to adversarial attacks. To address these challenges, Bayesian neural networks (BNNs) have emerged as a compelling extension of conventional neural networks, integrating uncertainty estimation into their predictive capabilities. This comprehensive primer presents a systematic introduction to the fundamental concepts of neural networks and Bayesian inference, elucidating their synergistic integration for the development of BNNs. The target audience comprises statisticians with a potential background in Bayesian methods but lacking deep learning expertise, as well as machine learners proficient in deep neural networks but with limited exposure to Bayesian statistics. We provide an overview of commonly employed priors, examining their impact on model behavior and performance. Additionally, we delve into the practical considerations associated with training and inference in BNNs. Furthermore, we explore advanced topics within the realm of BNN research, acknowledging the existence of ongoing debates and controversies. By offering insights into cutting-edge developments, this primer not only equips researchers and practitioners with a solid foundation in BNNs, but also illuminates the potential applications of this dynamic field. As a valuable resource, it fosters an understanding of BNNs and their promising prospects, facilitating further advancements in the pursuit of knowledge and innovation.
    A framework for paired-sample hypothesis testing for high-dimensional data. (arXiv:2309.16274v1 [stat.ML])
    The standard paired-sample testing approach in the multidimensional setting applies multiple univariate tests on the individual features, followed by p-value adjustments. Such an approach suffers when the data carry numerous features. A number of studies have shown that classification accuracy can be seen as a proxy for two-sample testing. However, neither theoretical foundations nor practical recipes have been proposed so far on how this strategy could be extended to multidimensional paired-sample testing. In this work, we put forward the idea that scoring functions can be produced by the decision rules defined by the perpendicular bisecting hyperplanes of the line segments connecting each pair of instances. Then, the optimal scoring function can be obtained by the pseudomedian of those rules, which we estimate by extending naturally the Hodges-Lehmann estimator. We accordingly propose a framework of a two-step testing procedure. First, we estimate the bisecting hyperplanes for each pair of instances and an aggregated rule derived through the Hodges-Lehmann estimator. The paired samples are scored by this aggregated rule to produce a unidimensional representation. Second, we perform a Wilcoxon signed-rank test on the obtained representation. Our experiments indicate that our approach has substantial performance gains in testing accuracy compared to the traditional multivariate and multiple testing, while at the same time estimates each feature's contribution to the final result.
    RealFill: Reference-Driven Generation for Authentic Image Completion. (arXiv:2309.16668v1 [cs.CV])
    Recent advances in generative imagery have brought forth outpainting and inpainting models that can produce high-quality, plausible image content in unknown regions, but the content these models hallucinate is necessarily inauthentic, since the models lack sufficient context about the true scene. In this work, we propose RealFill, a novel generative approach for image completion that fills in missing regions of an image with the content that should have been there. RealFill is a generative inpainting model that is personalized using only a few reference images of a scene. These reference images do not have to be aligned with the target image, and can be taken with drastically varying viewpoints, lighting conditions, camera apertures, or image styles. Once personalized, RealFill is able to complete a target image with visually compelling contents that are faithful to the original scene. We evaluate RealFill on a new image completion benchmark that covers a set of diverse and challenging scenarios, and find that it outperforms existing approaches by a large margin. See more results on our project page: https://realfill.github.io
    GInX-Eval: Towards In-Distribution Evaluation of Graph Neural Network Explanations. (arXiv:2309.16223v1 [cs.AI])
    Diverse explainability methods of graph neural networks (GNN) have recently been developed to highlight the edges and nodes in the graph that contribute the most to the model predictions. However, it is not clear yet how to evaluate the correctness of those explanations, whether it is from a human or a model perspective. One unaddressed bottleneck in the current evaluation procedure is the problem of out-of-distribution explanations, whose distribution differs from those of the training data. This important issue affects existing evaluation metrics such as the popular faithfulness or fidelity score. In this paper, we show the limitations of faithfulness metrics. We propose GInX-Eval (Graph In-distribution eXplanation Evaluation), an evaluation procedure of graph explanations that overcomes the pitfalls of faithfulness and offers new insights on explainability methods. Using a retraining strategy, the GInX score measures how informative removed edges are for the model and the EdgeRank score evaluates if explanatory edges are correctly ordered by their importance. GInX-Eval verifies if ground-truth explanations are instructive to the GNN model. In addition, it shows that many popular methods, including gradient-based methods, produce explanations that are not better than a random designation of edges as important subgraphs, challenging the findings of current works in the area. Results with GInX-Eval are consistent across multiple datasets and align with human evaluation.
    Predicting Cardiovascular Complications in Post-COVID-19 Patients Using Data-Driven Machine Learning Models. (arXiv:2309.16059v1 [cs.LG])
    The COVID-19 pandemic has globally posed numerous health challenges, notably the emergence of post-COVID-19 cardiovascular complications. This study addresses this by utilizing data-driven machine learning models to predict such complications in 352 post-COVID-19 patients from Iraq. Clinical data, including demographics, comorbidities, lab results, and imaging, were collected and used to construct predictive models. These models, leveraging various machine learning algorithms, demonstrated commendable performance in identifying patients at risk. Early detection through these models promises timely interventions and improved outcomes. In conclusion, this research underscores the potential of data-driven machine learning for predicting post-COVID-19 cardiovascular complications, emphasizing the need for continued validation and research in diverse clinical settings.
    Mixup Your Own Pairs. (arXiv:2309.16633v1 [cs.LG])
    In representation learning, regression has traditionally received less attention than classification. Directly applying representation learning techniques designed for classification to regression often results in fragmented representations in the latent space, yielding sub-optimal performance. In this paper, we argue that the potential of contrastive learning for regression has been overshadowed due to the neglect of two crucial aspects: ordinality-awareness and hardness. To address these challenges, we advocate "mixup your own contrastive pairs for supervised contrastive regression", instead of relying solely on real/augmented samples. Specifically, we propose Supervised Contrastive Learning for Regression with Mixup (SupReMix). It takes anchor-inclusive mixtures (mixup of the anchor and a distinct negative sample) as hard negative pairs and anchor-exclusive mixtures (mixup of two distinct negative samples) as hard positive pairs at the embedding level. This strategy formulates harder contrastive pairs by integrating richer ordinal information. Through extensive experiments on six regression datasets including 2D images, volumetric images, text, tabular data, and time-series signals, coupled with theoretical analysis, we demonstrate that SupReMix pre-training fosters continuous ordered representations of regression data, resulting in significant improvement in regression performance. Furthermore, SupReMix is superior to other approaches in a range of regression challenges including transfer learning, imbalanced training data, and scenarios with fewer training samples.
    The Trickle-down Impact of Reward (In-)consistency on RLHF. (arXiv:2309.16155v1 [cs.CL])
    Standard practice within Reinforcement Learning from Human Feedback (RLHF) involves optimizing against a Reward Model (RM), which itself is trained to reflect human preferences for desirable generations. A notable subject that is understudied is the (in-)consistency of RMs -- whether they can recognize the semantic changes to different prompts and appropriately adapt their reward assignments -- and their impact on the downstream RLHF model. In this paper, we visit a series of research questions relevant to RM inconsistency: (1) How can we measure the consistency of reward models? (2) How consistent are the existing RMs and how can we improve them? (3) In what ways does reward inconsistency influence the chatbots resulting from the RLHF model training? We propose Contrast Instructions -- a benchmarking strategy for the consistency of RM. Each example in Contrast Instructions features a pair of lexically similar instructions with different ground truth responses. A consistent RM is expected to rank the corresponding instruction and response higher than other combinations. We observe that current RMs trained with the standard ranking objective fail miserably on Contrast Instructions compared to average humans. To show that RM consistency can be improved efficiently without using extra training budget, we propose two techniques ConvexDA and RewardFusion, which enhance reward consistency through extrapolation during the RM training and inference stage, respectively. We show that RLHF models trained with a more consistent RM yield more useful responses, suggesting that reward inconsistency exhibits a trickle-down effect on the downstream RLHF process.
    Digital Twin-based Anomaly Detection with Curriculum Learning in Cyber-physical Systems. (arXiv:2309.15995v1 [cs.LG])
    Anomaly detection is critical to ensure the security of cyber-physical systems (CPS). However, due to the increasing complexity of attacks and CPS themselves, anomaly detection in CPS is becoming more and more challenging. In our previous work, we proposed a digital twin-based anomaly detection method, called ATTAIN, which takes advantage of both historical and real-time data of CPS. However, such data vary significantly in terms of difficulty. Therefore, similar to human learning processes, deep learning models (e.g., ATTAIN) can benefit from an easy-to-difficult curriculum. To this end, in this paper, we present a novel approach, named digitaL twin-based Anomaly deTecTion wIth Curriculum lEarning (LATTICE), which extends ATTAIN by introducing curriculum learning to optimize its learning paradigm. LATTICE attributes each sample with a difficulty score, before being fed into a training scheduler. The training scheduler samples batches of training data based on these difficulty scores such that learning from easy to difficult data can be performed. To evaluate LATTICE, we use five publicly available datasets collected from five real-world CPS testbeds. We compare LATTICE with ATTAIN and two other state-of-the-art anomaly detectors. Evaluation results show that LATTICE outperforms the three baselines and ATTAIN by 0.906%-2.367% in terms of the F1 score. LATTICE also, on average, reduces the training time of ATTAIN by 4.2% on the five datasets and is on par with the baselines in terms of detection delay time.
    Identifying Risk Factors for Post-COVID-19 Mental Health Disorders: A Machine Learning Perspective. (arXiv:2309.16055v1 [cs.LG])
    In this study, we leveraged machine learning techniques to identify risk factors associated with post-COVID-19 mental health disorders. Our analysis, based on data collected from 669 patients across various provinces in Iraq, yielded valuable insights. We found that age, gender, and geographical region of residence were significant demographic factors influencing the likelihood of developing mental health disorders in post-COVID-19 patients. Additionally, comorbidities and the severity of COVID-19 illness were important clinical predictors. Psychosocial factors, such as social support, coping strategies, and perceived stress levels, also played a substantial role. Our findings emphasize the complex interplay of multiple factors in the development of mental health disorders following COVID-19 recovery. Healthcare providers and policymakers should consider these risk factors when designing targeted interventions and support systems for individuals at risk. Machine learning-based approaches can provide a valuable tool for predicting and preventing adverse mental health outcomes in post-COVID-19 patients. Further research and prospective studies are needed to validate these findings and enhance our understanding of the long-term psychological impact of the COVID-19 pandemic. This study contributes to the growing body of knowledge regarding the mental health consequences of the COVID-19 pandemic and underscores the importance of a multidisciplinary approach to address the diverse needs of individuals on the path to recovery. Keywords: COVID-19, mental health, risk factors, machine learning, Iraq
    Learning Interpretable Characteristic Kernels via Decision Forests. (arXiv:1812.00029v3 [stat.ML] UPDATED)
    Decision forests are widely used for classification and regression tasks. A lesser known property of tree-based methods is that one can construct a proximity matrix from the tree(s), and these proximity matrices are induced kernels. While there has been extensive research on the applications and properties of kernels, there is relatively little research on kernels induced by decision forests. We construct Kernel Mean Embedding Random Forests (KMERF), which induce kernels from random trees and/or forests using leaf-node proximity. We introduce the notion of an asymptotically characteristic kernel, and prove that KMERF kernels are asymptotically characteristic for both discrete and continuous data. Because KMERF is data-adaptive, we suspected it would outperform kernels selected a priori on finite sample data. We illustrate that KMERF nearly dominates current state-of-the-art kernel-based tests across a diverse range of high-dimensional two-sample and independence testing settings. Furthermore, our forest-based approach is interpretable, and provides feature importance metrics that readily distinguish important dimensions, unlike other high-dimensional non-parametric testing procedures. Hence, this work demonstrates the decision forest-based kernel can be more powerful and more interpretable than existing methods, flying in the face of conventional wisdom of the trade-off between the two.
    Enhancing Cross-Category Learning in Recommendation Systems with Multi-Layer Embedding Training. (arXiv:2309.15881v1 [cs.LG])
    Modern DNN-based recommendation systems rely on training-derived embeddings of sparse features. Input sparsity makes obtaining high-quality embeddings for rarely-occurring categories harder as their representations are updated infrequently. We demonstrate a training-time technique to produce superior embeddings via effective cross-category learning and theoretically explain its surprising effectiveness. The scheme, termed the multi-layer embeddings training (MLET), trains embeddings using factorization of the embedding layer, with an inner dimension higher than the target embedding dimension. For inference efficiency, MLET converts the trained two-layer embedding into a single-layer one thus keeping inference-time model size unchanged. Empirical superiority of MLET is puzzling as its search space is not larger than that of the single-layer embedding. The strong dependence of MLET on the inner dimension is even more surprising. We develop a theory that explains both of these behaviors by showing that MLET creates an adaptive update mechanism modulated by the singular vectors of embeddings. When tested on multiple state-of-the-art recommendation models for click-through rate (CTR) prediction tasks, MLET consistently produces better models, especially for rare items. At constant model quality, MLET allows embedding dimension, and model size, reduction by up to 16x, and 5.8x on average, across the models.
    Intrinsic Language-Guided Exploration for Complex Long-Horizon Robotic Manipulation Tasks. (arXiv:2309.16347v1 [cs.RO])
    Current reinforcement learning algorithms struggle in sparse and complex environments, most notably in long-horizon manipulation tasks entailing a plethora of different sequences. In this work, we propose the Intrinsically Guided Exploration from Large Language Models (IGE-LLMs) framework. By leveraging LLMs as an assistive intrinsic reward, IGE-LLMs guides the exploratory process in reinforcement learning to address intricate long-horizon with sparse rewards robotic manipulation tasks. We evaluate our framework and related intrinsic learning methods in an environment challenged with exploration, and a complex robotic manipulation task challenged by both exploration and long-horizons. Results show IGE-LLMs (i) exhibit notably higher performance over related intrinsic methods and the direct use of LLMs in decision-making, (ii) can be combined and complement existing learning methods highlighting its modularity, (iii) are fairly insensitive to different intrinsic scaling parameters, and (iv) maintain robustness against increased levels of uncertainty and horizons.
    HyperPPO: A scalable method for finding small policies for robotic control. (arXiv:2309.16663v1 [cs.RO])
    Models with fewer parameters are necessary for the neural control of memory-limited, performant robots. Finding these smaller neural network architectures can be time-consuming. We propose HyperPPO, an on-policy reinforcement learning algorithm that utilizes graph hypernetworks to estimate the weights of multiple neural architectures simultaneously. Our method estimates weights for networks that are much smaller than those in common-use networks yet encode highly performant policies. We obtain multiple trained policies at the same time while maintaining sample efficiency and provide the user the choice of picking a network architecture that satisfies their computational constraints. We show that our method scales well - more training resources produce faster convergence to higher-performing architectures. We demonstrate that the neural policies estimated by HyperPPO are capable of decentralized control of a Crazyflie2.1 quadrotor. Website: https://sites.google.com/usc.edu/hyperppo
    Identifying factors associated with fast visual field progression in patients with ocular hypertension based on unsupervised machine learning. (arXiv:2309.15867v1 [cs.LG])
    Purpose: To identify ocular hypertension (OHT) subtypes with different trends of visual field (VF) progression based on unsupervised machine learning and to discover factors associated with fast VF progression. Participants: A total of 3133 eyes of 1568 ocular hypertension treatment study (OHTS) participants with at least five follow-up VF tests were included in the study. Methods: We used a latent class mixed model (LCMM) to identify OHT subtypes using standard automated perimetry (SAP) mean deviation (MD) trajectories. We characterized the subtypes based on demographic, clinical, ocular, and VF factors at the baseline. We then identified factors driving fast VF progression using generalized estimating equation (GEE) and justified findings qualitatively and quantitatively. Results: The LCMM model discovered four clusters (subtypes) of eyes with different trajectories of MD worsening. The number of eyes in clusters were 794 (25%), 1675 (54%), 531 (17%) and 133 (4%). We labelled the clusters as Improvers, Stables, Slow progressors, and Fast progressors based on their mean of MD decline, which were 0.08, -0.06, -0.21, and -0.45 dB/year, respectively. Eyes with fast VF progression had higher baseline age, intraocular pressure (IOP), pattern standard deviation (PSD) and refractive error (RE), but lower central corneal thickness (CCT). Fast progression was associated with calcium channel blockers, being male, heart disease history, diabetes history, African American race, stroke history, and migraine headaches.
    A novel approach to measuring patent claim scope based on probabilities obtained from (large) language models. (arXiv:2309.10003v2 [cs.CL] UPDATED)
    This work proposes to measure the scope of a patent claim as the reciprocal of the self-information contained in this claim. A probability of occurrence of the claim is obtained from a language model and this probability is used to compute the self-information. Grounded in information theory, this approach is based on the assumption that an unlikely concept is more informative than a usual concept, insofar as it is more surprising. In turn, the more surprising the information required to defined the claim, the narrower its scope. Five language models are considered, ranging from simplest models (each word or character is assigned an identical probability) to intermediate models (using average word or character frequencies), to a large language model (GPT2). Interestingly, the scope resulting from the simplest language models is proportional to the reciprocal of the number of words or characters involved in the claim, a metric already used in previous works. Application is made to multiple series of patent claims directed to distinct inventions, where each series consists of claims devised to have a gradually decreasing scope. The performance of the language models is assessed with respect to several ad hoc tests. The more sophisticated the model, the better the results. I.e., the GPT2 probability model outperforms models based on word and character frequencies, which themselves outdo the simplest models based on word or character counts. Still, the character count appears to be a more reliable indicator than the word count.
    GNNHLS: Evaluating Graph Neural Network Inference via High-Level Synthesis. (arXiv:2309.16022v1 [cs.LG])
    With the ever-growing popularity of Graph Neural Networks (GNNs), efficient GNN inference is gaining tremendous attention. Field-Programming Gate Arrays (FPGAs) are a promising execution platform due to their fine-grained parallelism, low-power consumption, reconfigurability, and concurrent execution. Even better, High-Level Synthesis (HLS) tools bridge the gap between the non-trivial FPGA development efforts and rapid emergence of new GNN models. In this paper, we propose GNNHLS, an open-source framework to comprehensively evaluate GNN inference acceleration on FPGAs via HLS, containing a software stack for data generation and baseline deployment, and FPGA implementations of 6 well-tuned GNN HLS kernels. We evaluate GNNHLS on 4 graph datasets with distinct topologies and scales. The results show that GNNHLS achieves up to 50.8x speedup and 423x energy reduction relative to the CPU baselines. Compared with the GPU baselines, GNNHLS achieves up to 5.16x speedup and 74.5x energy reduction.
    Abdominal multi-organ segmentation in CT using Swinunter. (arXiv:2309.16210v1 [eess.IV])
    Abdominal multi-organ segmentation in computed tomography (CT) is crucial for many clinical applications including disease detection and treatment planning. Deep learning methods have shown unprecedented performance in this perspective. However, it is still quite challenging to accurately segment different organs utilizing a single network due to the vague boundaries of organs, the complex background, and the substantially different organ size scales. In this work we used make transformer-based model for training. It was found through previous years' competitions that basically all of the top 5 methods used CNN-based methods, which is likely due to the lack of data volume that prevents transformer-based methods from taking full advantage. The thousands of samples in this competition may enable the transformer-based model to have more excellent results. The results on the public validation set also show that the transformer-based model can achieve an acceptable result and inference time.
    ModuLoRA: Finetuning 3-Bit LLMs on Consumer GPUs by Integrating with Modular Quantizers. (arXiv:2309.16119v1 [cs.LG])
    We propose a memory-efficient finetuning algorithm for large language models (LLMs) that supports finetuning LLMs with 65B parameters in 3-bit or 4-bit precision on as little as one 48GB GPU. Our method, modular low-rank adaptation (ModuLoRA), integrates any user-specified weight quantizer with finetuning via low-rank adapters (LoRAs). Our approach relies on a simple quantization-agnostic backward pass that adaptively materializes low-precision LLM weights from a custom black-box quantization module. This approach enables finetuning 3-bit LLMs for the first time--leveraging state-of-the-art 3-bit OPTQ quantization often outperforms finetuning that relies on less sophisticated 4-bit and 8-bit methods. In our experiments, ModuLoRA attains competitive performance on text classification, natural language infernece, and instruction following tasks using significantly less memory than existing approaches, and we also surpass the state-of-the-art ROUGE score on a popular summarization task. We release ModuLoRA together with a series of low-precision models--including the first family of 3-bit instruction following Alpaca LLMs--as part of LLMTOOLS, a user-friendly library for quantizing, running, and finetuning LLMs on consumer GPUs.
    Advancing Federated Learning in 6G: A Trusted Architecture with Graph-based Analysis. (arXiv:2309.05525v3 [cs.NI] UPDATED)
    Integrating native AI support into the network architecture is an essential objective of 6G. Federated Learning (FL) emerges as a potential paradigm, facilitating decentralized AI model training across a diverse range of devices under the coordination of a central server. However, several challenges hinder its wide application in the 6G context, such as malicious attacks and privacy snooping on local model updates, and centralization pitfalls. This work proposes a trusted architecture for supporting FL, which utilizes Distributed Ledger Technology (DLT) and Graph Neural Network (GNN), including three key features. First, a pre-processing layer employing homomorphic encryption is incorporated to securely aggregate local models, preserving the privacy of individual models. Second, given the distributed nature and graph structure between clients and nodes in the pre-processing layer, GNN is leveraged to identify abnormal local models, enhancing system security. Third, DLT is utilized to decentralize the system by selecting one of the candidates to perform the central server's functions. Additionally, DLT ensures reliable data management by recording data exchanges in an immutable and transparent ledger. The feasibility of the novel architecture is validated through simulations, demonstrating improved performance in anomalous model detection and global model accuracy compared to relevant baselines.
    Data Augmentation in the Underparameterized and Overparameterized Regimes. (arXiv:2202.09134v3 [cs.LG] UPDATED)
    We provide results that exactly quantify how data augmentation affects the variance and limiting distribution of estimates, and analyze several specific models in detail. The results confirm some observations made in machine learning practice, but also lead to unexpected findings: Data augmentation may increase rather than decrease the uncertainty of estimates, such as the empirical prediction risk. It can act as a regularizer, but fails to do so in certain high-dimensional problems, and it may shift the double-descent peak of an empirical risk. Overall, the analysis shows that several properties data augmentation has been attributed with are not either true or false, but rather depend on a combination of factors -- notably the data distribution, the properties of the estimator, and the interplay of sample size, number of augmentations, and dimension. Our main theoretical tool is a limit theorem for functions of randomly transformed, high-dimensional random vectors. The proof draws on work in probability on noise stability of functions of many variables.
    Analytical Modelling of Raw Data for Flow-Guided In-body Nanoscale Localization. (arXiv:2309.16034v1 [cs.ET])
    Advancements in nanotechnology and material science are paving the way toward nanoscale devices that combine sensing, computing, data and energy storage, and wireless communication. In precision medicine, these nanodevices show promise for disease diagnostics, treatment, and monitoring from within the patients' bloodstreams. Assigning the location of a sensed biological event with the event itself, which is the main proposition of flow-guided in-body nanoscale localization, would be immensely beneficial from the perspective of precision medicine. The nanoscale nature of the nanodevices and the challenging environment that the bloodstream represents, result in current flow-guided localization approaches being constrained in their communication and energy-related capabilities. The communication and energy constraints of the nanodevices result in different features of raw data for flow-guided localization, in turn affecting its performance. An analytical modeling of the effects of imperfect communication and constrained energy causing intermittent operation of the nanodevices on the raw data produced by the nanodevices would be beneficial. Hence, we propose an analytical model of raw data for flow-guided localization, where the raw data is modeled as a function of communication and energy-related capabilities of the nanodevice. We evaluate the model by comparing its output with the one obtained through the utilization of a simulator for objective evaluation of flow-guided localization, featuring comparably higher level of realism. Our results across a number of scenarios and heterogeneous performance metrics indicate high similarity between the model and simulator-generated raw datasets.
    RLLTE: Long-Term Evolution Project of Reinforcement Learning. (arXiv:2309.16382v1 [cs.AI])
    We present RLLTE: a long-term evolution, extremely modular, and open-source framework for reinforcement learning (RL) research and application. Beyond delivering top-notch algorithm implementations, RLLTE also serves as a toolkit for developing algorithms. More specifically, RLLTE decouples the RL algorithms completely from the exploitation-exploration perspective, providing a large number of components to accelerate algorithm development and evolution. In particular, RLLTE is the first RL framework to build a complete and luxuriant ecosystem, which includes model training, evaluation, deployment, benchmark hub, and large language model (LLM)-empowered copilot. RLLTE is expected to set standards for RL engineering practice and be highly stimulative for industry and academia.
    Flexible and efficient spatial extremes emulation via variational autoencoders. (arXiv:2307.08079v2 [stat.ML] UPDATED)
    Many real-world processes have complex tail dependence structures that cannot be characterized using classical Gaussian processes. More flexible spatial extremes models exhibit appealing extremal dependence properties but are often exceedingly prohibitive to fit and simulate from in high dimensions. In this paper, we develop a new spatial extremes model that has flexible and non-stationary dependence properties, and we integrate it in the encoding-decoding structure of a variational autoencoder (XVAE), whose parameters are estimated via variational Bayes combined with deep learning. The XVAE can be used as a spatio-temporal emulator that characterizes the distribution of potential mechanistic model output states and produces outputs that have the same statistical properties as the inputs, especially in the tail. As an aside, our approach also provides a novel way of making fast inference with complex extreme-value processes. Through extensive simulation studies, we show that our XVAE is substantially more time-efficient than traditional Bayesian inference while also outperforming many spatial extremes models with a stationary dependence structure. To further demonstrate the computational power of the XVAE, we analyze a high-resolution satellite-derived dataset of sea surface temperature in the Red Sea, which includes 30 years of daily measurements at 16703 grid cells. We find that the extremal dependence strength is weaker in the interior of Red Sea and it has decreased slightly over time.
    DPA-WNO: A gray box model for a class of stochastic mechanics problem. (arXiv:2309.15128v2 [cs.LG] UPDATED)
    The well-known governing physics in science and engineering is often based on certain assumptions and approximations. Therefore, analyses and designs carried out based on these equations are also approximate. The emergence of data-driven models has, to a certain degree, addressed this challenge; however, the purely data-driven models often (a) lack interpretability, (b) are data-hungry, and (c) do not generalize beyond the training window. Operator learning has recently been proposed as a potential alternative to address the aforementioned challenges; however, the challenges are still persistent. We here argue that one of the possible solutions resides in data-physics fusion, where the data-driven model is used to correct/identify the missing physics. To that end, we propose a novel Differentiable Physics Augmented Wavelet Neural Operator (DPA-WNO). The proposed DPA-WNO blends a differentiable physics solver with the Wavelet Neural Operator (WNO), where the role of WNO is to model the missing physics. This empowers the proposed framework to exploit the capability of WNO to learn from data while retaining the interpretability and generalizability associated with physics-based solvers. We illustrate the applicability of the proposed approach in solving time-dependent uncertainty quantification problems due to randomness in the initial condition. Four benchmark uncertainty quantification and reliability analysis examples from various fields of science and engineering are solved using the proposed approach. The results presented illustrate interesting features of the proposed approach.
    Lossless Transformations and Excess Risk Bounds in Statistical Inference. (arXiv:2307.16735v2 [cs.IT] UPDATED)
    We study the excess minimum risk in statistical inference, defined as the difference between the minimum expected loss in estimating a random variable from an observed feature vector and the minimum expected loss in estimating the same random variable from a transformation (statistic) of the feature vector. After characterizing lossless transformations, i.e., transformations for which the excess risk is zero for all loss functions, we construct a partitioning test statistic for the hypothesis that a given transformation is lossless and show that for i.i.d. data the test is strongly consistent. More generally, we develop information-theoretic upper bounds on the excess risk that uniformly hold over fairly general classes of loss functions. Based on these bounds, we introduce the notion of a delta-lossless transformation and give sufficient conditions for a given transformation to be universally delta-lossless. Applications to classification, nonparametric regression, portfolio strategies, information bottleneck, and deep learning, are also surveyed.
    Uncovering Neural Scaling Laws in Molecular Representation Learning. (arXiv:2309.15123v2 [physics.chem-ph] UPDATED)
    Molecular Representation Learning (MRL) has emerged as a powerful tool for drug and materials discovery in a variety of tasks such as virtual screening and inverse design. While there has been a surge of interest in advancing model-centric techniques, the influence of both data quantity and quality on molecular representations is not yet clearly understood within this field. In this paper, we delve into the neural scaling behaviors of MRL from a data-centric viewpoint, examining four key dimensions: (1) data modalities, (2) dataset splitting, (3) the role of pre-training, and (4) model capacity. Our empirical studies confirm a consistent power-law relationship between data volume and MRL performance across these dimensions. Additionally, through detailed analysis, we identify potential avenues for improving learning efficiency. To challenge these scaling laws, we adapt seven popular data pruning strategies to molecular data and benchmark their performance. Our findings underline the importance of data-centric MRL and highlight possible directions for future research.
    Telescope: An Automated Hybrid Forecasting Approach on a Level-Playing Field. (arXiv:2309.15871v1 [cs.LG])
    In many areas of decision-making, forecasting is an essential pillar. Consequently, many different forecasting methods have been proposed. From our experience, recently presented forecasting methods are computationally intensive, poorly automated, tailored to a particular data set, or they lack a predictable time-to-result. To this end, we introduce Telescope, a novel machine learning-based forecasting approach that automatically retrieves relevant information from a given time series and splits it into parts, handling each of them separately. In contrast to deep learning methods, our approach doesn't require parameterization or the need to train and fit a multitude of parameters. It operates with just one time series and provides forecasts within seconds without any additional setup. Our experiments show that Telescope outperforms recent methods by providing accurate and reliable forecasts while making no assumptions about the analyzed time series.
    STAEformer: Spatio-Temporal Adaptive Embedding Makes Vanilla Transformer SOTA for Traffic Forecasting. (arXiv:2308.10425v4 [cs.LG] UPDATED)
    With the rapid development of the Intelligent Transportation System (ITS), accurate traffic forecasting has emerged as a critical challenge. The key bottleneck lies in capturing the intricate spatio-temporal traffic patterns. In recent years, numerous neural networks with complicated architectures have been proposed to address this issue. However, the advancements in network architectures have encountered diminishing performance gains. In this study, we present a novel component called spatio-temporal adaptive embedding that can yield outstanding results with vanilla transformers. Our proposed Spatio-Temporal Adaptive Embedding transformer (STAEformer) achieves state-of-the-art performance on five real-world traffic forecasting datasets. Further experiments demonstrate that spatio-temporal adaptive embedding plays a crucial role in traffic forecasting by effectively capturing intrinsic spatio-temporal relations and chronological information in traffic time series.
    Kairos: Practical Intrusion Detection and Investigation using Whole-system Provenance. (arXiv:2308.05034v3 [cs.CR] UPDATED)
    Provenance graphs are structured audit logs that describe the history of a system's execution. Recent studies have explored a variety of techniques to analyze provenance graphs for automated host intrusion detection, focusing particularly on advanced persistent threats. Sifting through their design documents, we identify four common dimensions that drive the development of provenance-based intrusion detection systems (PIDSes): scope (can PIDSes detect modern attacks that infiltrate across application boundaries?), attack agnosticity (can PIDSes detect novel attacks without a priori knowledge of attack characteristics?), timeliness (can PIDSes efficiently monitor host systems as they run?), and attack reconstruction (can PIDSes distill attack activity from large provenance graphs so that sysadmins can easily understand and quickly respond to system intrusion?). We present KAIROS, the first PIDS that simultaneously satisfies the desiderata in all four dimensions, whereas existing approaches sacrifice at least one and struggle to achieve comparable detection performance. Kairos leverages a novel graph neural network-based encoder-decoder architecture that learns the temporal evolution of a provenance graph's structural changes to quantify the degree of anomalousness for each system event. Then, based on this fine-grained information, Kairos reconstructs attack footprints, generating compact summary graphs that accurately describe malicious activity over a stream of system audit logs. Using state-of-the-art benchmark datasets, we demonstrate that Kairos outperforms previous approaches.
    Developing a Philosophical Framework for Fair Machine Learning: Lessons From The Case of Algorithmic Collusion. (arXiv:2208.06308v2 [cs.LG] UPDATED)
    Fair machine learning research has been primarily concerned with classification tasks that result in discrimination. However, as machine learning algorithms are applied in new contexts the harms and injustices that result are qualitatively different than those presently studied. The existing research paradigm in machine learning which develops metrics and definitions of fairness cannot account for these qualitatively different types of injustice. One example of this is the problem of algorithmic collusion and market fairness. The negative consequences of algorithmic collusion affect all consumers, not only particular members of a protected class. Drawing on this case study, I propose an ethical framework for researchers and practitioners in machine learning seeking to develop and apply fairness metrics that extends to new domains. This contribution ties the development of formal metrics of fairness to specifically scoped normative principles. This enables fairness metrics to reflect different concerns from discrimination. I conclude with the limitations of my proposal and discuss promising avenues for future research.
    Creating walls to avoid unwanted points in root finding and optimization. (arXiv:2309.11475v2 [math.OC] UPDATED)
    In root finding and optimization, there are many cases where there is a closed set $A$ one likes that the sequence constructed by one's favourite method will not converge to A (here, we do not assume extra properties on $A$ such as being convex or connected). For example, if one wants to find roots, and one chooses initial points in the basin of attraction for 1 root $x^*$ (a fact which one may not know before hand), then one will always end up in that root. In this case, one would like to have a mechanism to avoid this point $z^*$ in the next runs of one's algorithm. In this paper, we propose two new methods aiming to achieve this. In the first method, we divide the cost function by an appropriate power of the distance function to $A$. This idea is inspired by how one would try to find all roots of a function in 1 variable. In the second method, which is more suitable for constrained optimization, we redefine the value of the function to be a big constant on $A$. We also propose, based on this, an algorithm to escape the basin of attraction of a component of positive dimension to reach another component. As an application, we prove a rigorous guarantee for finding roots of a meromorphic function of 1 complex variable in a given domain. Along the way, we compare with main existing relevant methods in the current literature. We provide several examples in various different settings to illustrate the usefulness of the new approach.
    Improving Robustness of Deep Convolutional Neural Networks via Multiresolution Learning. (arXiv:2309.13752v2 [cs.LG] UPDATED)
    The current learning process of deep learning, regardless of any deep neural network (DNN) architecture and/or learning algorithm used, is essentially a single resolution training. We explore multiresolution learning and show that multiresolution learning can significantly improve robustness of DNN models for both 1D signal and 2D signal (image) prediction problems. We demonstrate this improvement in terms of both noise and adversarial robustness as well as with small training dataset size. Our results also suggest that it may not be necessary to trade standard accuracy for robustness with multiresolution learning, which is, interestingly, contrary to the observation obtained from the traditional single resolution learning setting.
    HACMan: Learning Hybrid Actor-Critic Maps for 6D Non-Prehensile Manipulation. (arXiv:2305.03942v3 [cs.RO] UPDATED)
    Manipulating objects without grasping them is an essential component of human dexterity, referred to as non-prehensile manipulation. Non-prehensile manipulation may enable more complex interactions with the objects, but also presents challenges in reasoning about gripper-object interactions. In this work, we introduce Hybrid Actor-Critic Maps for Manipulation (HACMan), a reinforcement learning approach for 6D non-prehensile manipulation of objects using point cloud observations. HACMan proposes a temporally-abstracted and spatially-grounded object-centric action representation that consists of selecting a contact location from the object point cloud and a set of motion parameters describing how the robot will move after making contact. We modify an existing off-policy RL algorithm to learn in this hybrid discrete-continuous action representation. We evaluate HACMan on a 6D object pose alignment task in both simulation and in the real world. On the hardest version of our task, with randomized initial poses, randomized 6D goals, and diverse object categories, our policy demonstrates strong generalization to unseen object categories without a performance drop, achieving an 89% success rate on unseen objects in simulation and 50% success rate with zero-shot transfer in the real world. Compared to alternative action representations, HACMan achieves a success rate more than three times higher than the best baseline. With zero-shot sim2real transfer, our policy can successfully manipulate unseen objects in the real world for challenging non-planar goals, using dynamic and contact-rich non-prehensile skills. Videos can be found on the project website: https://hacman-2023.github.io.
    A Graph Neural Network-Based QUBO-Formulated Hamiltonian-Inspired Loss Function for Combinatorial Optimization using Reinforcement Learning. (arXiv:2308.13978v2 [cs.AI] UPDATED)
    Quadratic Unconstrained Binary Optimization (QUBO) is a generic technique to model various NP-hard combinatorial optimization problems in the form of binary variables. The Hamiltonian function is often used to formulate QUBO problems where it is used as the objective function in the context of optimization. Recently, PI-GNN, a generic scalable framework, has been proposed to address the Combinatorial Optimization (CO) problems over graphs based on a simple Graph Neural Network (GNN) architecture. Their novel contribution was a generic QUBO-formulated Hamiltonian-inspired loss function that was optimized using GNN. In this study, we address a crucial issue related to the aforementioned setup especially observed in denser graphs. The reinforcement learning-based paradigm has also been widely used to address numerous CO problems. Here we also formulate and empirically evaluate the compatibility of the QUBO-formulated Hamiltonian as the generic reward function in the Reinforcement Learning paradigm to directly integrate the actual node projection status during training as the form of rewards. In our experiments, we observed up to 44% improvement in the RL-based setup compared to the PI-GNN algorithm. Our implementation can be found in https://github.com/rizveeredwan/learning-graph-structure.
    Temporal Graph Benchmark for Machine Learning on Temporal Graphs. (arXiv:2307.01026v2 [cs.LG] UPDATED)
    We present the Temporal Graph Benchmark (TGB), a collection of challenging and diverse benchmark datasets for realistic, reproducible, and robust evaluation of machine learning models on temporal graphs. TGB datasets are of large scale, spanning years in duration, incorporate both node and edge-level prediction tasks and cover a diverse set of domains including social, trade, transaction, and transportation networks. For both tasks, we design evaluation protocols based on realistic use-cases. We extensively benchmark each dataset and find that the performance of common models can vary drastically across datasets. In addition, on dynamic node property prediction tasks, we show that simple methods often achieve superior performance compared to existing temporal graph models. We believe that these findings open up opportunities for future research on temporal graphs. Finally, TGB provides an automated machine learning pipeline for reproducible and accessible temporal graph research, including data loading, experiment setup and performance evaluation. TGB will be maintained and updated on a regular basis and welcomes community feedback. TGB datasets, data loaders, example codes, evaluation setup, and leaderboards are publicly available at https://tgb.complexdatalab.com/.
    Model Sparsity Can Simplify Machine Unlearning. (arXiv:2304.04934v8 [cs.LG] UPDATED)
    In response to recent data regulation requirements, machine unlearning (MU) has emerged as a critical process to remove the influence of specific examples from a given model. Although exact unlearning can be achieved through complete model retraining using the remaining dataset, the associated computational costs have driven the development of efficient, approximate unlearning techniques. Moving beyond data-centric MU approaches, our study introduces a novel model-based perspective: model sparsification via weight pruning, which is capable of reducing the gap between exact unlearning and approximate unlearning. We show in both theory and practice that model sparsity can boost the multi-criteria unlearning performance of an approximate unlearner, closing the approximation gap, while continuing to be efficient. This leads to a new MU paradigm, termed prune first, then unlearn, which infuses a sparse model prior into the unlearning process. Building on this insight, we also develop a sparsity-aware unlearning method that utilizes sparsity regularization to enhance the training process of approximate unlearning. Extensive experiments show that our proposals consistently benefit MU in various unlearning scenarios. A notable highlight is the 77% unlearning efficacy gain of fine-tuning (one of the simplest unlearning methods) when using sparsity-aware unlearning. Furthermore, we demonstrate the practical impact of our proposed MU methods in addressing other machine learning challenges, such as defending against backdoor attacks and enhancing transfer learning. Codes are available at https://github.com/OPTML-Group/Unlearn-Sparse.
    Convergence of AdaGrad for Non-convex Objectives: Simple Proofs and Relaxed Assumptions. (arXiv:2305.18471v2 [cs.LG] UPDATED)
    We provide a simple convergence proof for AdaGrad optimizing non-convex objectives under only affine noise variance and bounded smoothness assumptions. The proof is essentially based on a novel auxiliary function $\xi$ that helps eliminate the complexity of handling the correlation between the numerator and denominator of AdaGrad's update. Leveraging simple proofs, we are able to obtain tighter results than existing results \citep{faw2022power} and extend the analysis to several new and important cases. Specifically, for the over-parameterized regime, we show that AdaGrad needs only $\mathcal{O}(\frac{1}{\varepsilon^2})$ iterations to ensure the gradient norm smaller than $\varepsilon$, which matches the rate of SGD and significantly tighter than existing rates $\mathcal{O}(\frac{1}{\varepsilon^4})$ for AdaGrad. We then discard the bounded smoothness assumption and consider a realistic assumption on smoothness called $(L_0,L_1)$-smooth condition, which allows local smoothness to grow with the gradient norm. Again based on the auxiliary function $\xi$, we prove that AdaGrad succeeds in converging under $(L_0,L_1)$-smooth condition as long as the learning rate is lower than a threshold. Interestingly, we further show that the requirement on learning rate under the $(L_0,L_1)$-smooth condition is necessary via proof by contradiction, in contrast with the case of uniform smoothness conditions where convergence is guaranteed regardless of learning rate choices. Together, our analyses broaden the understanding of AdaGrad and demonstrate the power of the new auxiliary function in the investigations of AdaGrad.
    Transformer-VQ: Linear-Time Transformers via Vector Quantization. (arXiv:2309.16354v1 [cs.LG])
    We introduce Transformer-VQ, a decoder-only transformer computing softmax-based dense self-attention in linear time. Transformer-VQ's efficient attention is enabled by vector-quantized keys and a novel caching mechanism. In large-scale experiments, Transformer-VQ is shown highly competitive in quality, with strong results on Enwik8 (0.99 bpb), PG-19 (26.6 ppl), and ImageNet64 (3.16 bpb). Code: https://github.com/transformer-vq/transformer_vq
    Private Stochastic Optimization With Large Worst-Case Lipschitz Parameter: Optimal Rates for (Non-Smooth) Convex Losses and Extension to Non-Convex Losses. (arXiv:2209.07403v4 [cs.LG] UPDATED)
    We study differentially private (DP) stochastic optimization (SO) with loss functions whose worst-case Lipschitz parameter over all data points may be extremely large. To date, the vast majority of work on DP SO assumes that the loss is uniformly Lipschitz continuous over data (i.e. stochastic gradients are uniformly bounded over all data points). While this assumption is convenient, it often leads to pessimistic excess risk bounds. In many practical problems, the worst-case (uniform) Lipschitz parameter of the loss over all data points may be extremely large due to outliers. In such cases, the error bounds for DP SO, which scale with the worst-case Lipschitz parameter of the loss, are vacuous. To address these limitations, this work provides near-optimal excess risk bounds that do not depend on the uniform Lipschitz parameter of the loss. Building on a recent line of work (Wang et al., 2020; Kamath et al., 2022), we assume that stochastic gradients have bounded $k$-th order moments for some $k \geq 2$. Compared with works on uniformly Lipschitz DP SO, our excess risk scales with the $k$-th moment bound instead of the uniform Lipschitz parameter of the loss, allowing for significantly faster rates in the presence of outliers and/or heavy-tailed data. For convex and strongly convex loss functions, we provide the first asymptotically optimal excess risk bounds (up to a logarithmic factor). In contrast to (Wang et al., 2020; Kamath et al., 2022), our bounds do not require the loss function to be differentiable/smooth. We also devise a linear-time algorithm for smooth losses that has excess risk that is tight in certain practical parameter regimes. Additionally, our work is the first to address non-convex non-uniformly Lipschitz loss functions satisfying the Proximal-PL inequality; this covers some practical machine learning models. Our Proximal-PL algorithm has near-optimal excess risk.
    Disambiguated Attention Embedding for Multi-Instance Partial-Label Learning. (arXiv:2305.16912v2 [cs.LG] UPDATED)
    In many real-world tasks, the concerned objects can be represented as a multi-instance bag associated with a candidate label set, which consists of one ground-truth label and several false positive labels. Multi-instance partial-label learning (MIPL) is a learning paradigm to deal with such tasks and has achieved favorable performances. Existing MIPL approach follows the instance-space paradigm by assigning augmented candidate label sets of bags to each instance and aggregating bag-level labels from instance-level labels. However, this scheme may be suboptimal as global bag-level information is ignored and the predicted labels of bags are sensitive to predictions of negative instances. In this paper, we study an alternative scheme where a multi-instance bag is embedded into a single vector representation. Accordingly, an intuitive algorithm named DEMIPL, i.e., Disambiguated attention Embedding for Multi-Instance Partial-Label learning, is proposed. DEMIPL employs a disambiguation attention mechanism to aggregate a multi-instance bag into a single vector representation, followed by a momentum-based disambiguation strategy to identify the ground-truth label from the candidate label set. Furthermore, we introduce a real-world MIPL dataset for colorectal cancer classification. Experimental results on benchmark and real-world datasets validate the superiority of DEMIPL against the compared MIPL and partial-label learning approaches.
    Online Distribution Shift Detection via Recency Prediction. (arXiv:2211.09916v3 [cs.RO] UPDATED)
    When deploying modern machine learning-enabled robotic systems in high-stakes applications, detecting distribution shift is critical. However, most existing methods for detecting distribution shift are not well-suited to robotics settings, where data often arrives in a streaming fashion and may be very high-dimensional. In this work, we present an online method for detecting distribution shift with guarantees on the false positive rate - i.e., when there is no distribution shift, our system is very unlikely (with probability $< \epsilon$) to falsely issue an alert; any alerts that are issued should therefore be heeded. Our method is specifically designed for efficient detection even with high dimensional data, and it empirically achieves up to 11x faster detection on realistic robotics settings compared to prior work while maintaining a low false negative rate in practice (whenever there is a distribution shift in our experiments, our method indeed emits an alert). We demonstrate our approach in both simulation and hardware for a visual servoing task, and show that our method indeed issues an alert before a failure occurs.
    Is My Prediction Arbitrary? Confounding Effects of Variance in Fair Classification. (arXiv:2301.11562v5 [cs.LG] UPDATED)
    Variance in predictions across different trained models is a significant, under-explored source of error in fair classification. In practice, the variance on some data examples is so large that decisions can be effectively arbitrary. To investigate this problem, we take an experimental approach and make four overarching contributions: We 1) Define a metric called self-consistency, derived from variance, which we use as a proxy for measuring and reducing arbitrariness; 2) Develop an ensembling algorithm that abstains from classification when a prediction would be arbitrary; 3) Conduct the largest to-date empirical study of the role of variance (vis-a-vis self-consistency and arbitrariness) in fair classification; and, 4) Release a toolkit that makes the US Home Mortgage Disclosure Act (HMDA) datasets easily usable for future research. Altogether, our experiments reveal shocking insights about the reliability of conclusions on benchmark datasets. Most fairness classification benchmarks are close-to-fair when taking into account the amount of arbitrariness present in predictions -- before we even try to apply common fairness interventions. This finding calls into question the practical utility of common algorithmic fairness methods, and in turn suggests that we should fundamentally reconsider how we choose to measure fairness in machine learning.
    On Learning with LAD. (arXiv:2309.16630v1 [cs.LG])
    The logical analysis of data, LAD, is a technique that yields two-class classifiers based on Boolean functions having disjunctive normal form (DNF) representation. Although LAD algorithms employ optimization techniques, the resulting binary classifiers or binary rules do not lead to overfitting. We propose a theoretical justification for the absence of overfitting by estimating the Vapnik-Chervonenkis dimension (VC dimension) for LAD models where hypothesis sets consist of DNFs with a small number of cubic monomials. We illustrate and confirm our observations empirically.
    Open Source Infrastructure for Differentiable Density Functional Theory. (arXiv:2309.15985v1 [cs.LG])
    Learning exchange correlation functionals, used in quantum chemistry calculations, from data has become increasingly important in recent years, but training such a functional requires sophisticated software infrastructure. For this reason, we build open source infrastructure to train neural exchange correlation functionals. We aim to standardize the processing pipeline by adapting state-of-the-art techniques from work done by multiple groups. We have open sourced the model in the DeepChem library to provide a platform for additional research on differentiable quantum chemistry methods.
    Dice Semimetric Losses: Optimizing the Dice Score with Soft Labels. (arXiv:2303.16296v3 [cs.CV] UPDATED)
    The soft Dice loss (SDL) has taken a pivotal role in numerous automated segmentation pipelines in the medical imaging community. Over the last years, some reasons behind its superior functioning have been uncovered and further optimizations have been explored. However, there is currently no implementation that supports its direct utilization in scenarios involving soft labels. Hence, a synergy between the use of SDL and research leveraging the use of soft labels, also in the context of model calibration, is still missing. In this work, we introduce Dice semimetric losses (DMLs), which (i) are by design identical to SDL in a standard setting with hard labels, but (ii) can be employed in settings with soft labels. Our experiments on the public QUBIQ, LiTS and KiTS benchmarks confirm the potential synergy of DMLs with soft labels (e.g.\ averaging, label smoothing, and knowledge distillation) over hard labels (e.g.\ majority voting and random selection). As a result, we obtain superior Dice scores and model calibration, which supports the wider adoption of DMLs in practice. The code is available at \href{https://github.com/zifuwanggg/JDTLosses}{https://github.com/zifuwanggg/JDTLosses}.
    Unsupervised Discovery of Extreme Weather Events Using Universal Representations of Emergent Organization. (arXiv:2304.12586v2 [physics.comp-ph] UPDATED)
    Spontaneous self-organization is ubiquitous in systems far from thermodynamic equilibrium. While organized structures that emerge dominate transport properties, universal representations that identify and describe these key objects remain elusive. Here, we introduce a theoretically-grounded framework for describing emergent organization that, via data-driven algorithms, is constructive in practice. Its building blocks are spacetime lightcones that embody how information propagates across a system through local interactions. We show that predictive equivalence classes of lightcones -- local causal states -- capture organized behaviors and coherent structures in complex spatiotemporal systems. Employing an unsupervised physics-informed machine learning algorithm and a high-performance computing implementation, we demonstrate automatically discovering coherent structures in two real world domain science problems. We show that local causal states identify vortices and track their power-law decay behavior in two-dimensional fluid turbulence. We then show how to detect and track familiar extreme weather events -- hurricanes and atmospheric rivers -- and discover other novel coherent structures associated with precipitation extremes in high-resolution climate data at the grid-cell level.
    Generating Personalized Insulin Treatments Strategies with Deep Conditional Generative Time Series Models. (arXiv:2309.16521v1 [stat.ML])
    We propose a novel framework that combines deep generative time series models with decision theory for generating personalized treatment strategies. It leverages historical patient trajectory data to jointly learn the generation of realistic personalized treatment and future outcome trajectories through deep generative time series models. In particular, our framework enables the generation of novel multivariate treatment strategies tailored to the personalized patient history and trained for optimal expected future outcomes based on conditional expected utility maximization. We demonstrate our framework by generating personalized insulin treatment strategies and blood glucose predictions for hospitalized diabetes patients, showcasing the potential of our approach for generating improved personalized treatment strategies. Keywords: deep generative model, probabilistic decision support, personalized treatment generation, insulin and blood glucose prediction
    Horospherical Decision Boundaries for Large Margin Classification in Hyperbolic Space. (arXiv:2302.06807v3 [stat.ML] UPDATED)
    Hyperbolic spaces have been quite popular in the recent past for representing hierarchically organized data. Further, several classification algorithms for data in these spaces have been proposed in the literature. These algorithms mainly use either hyperplanes or geodesics for decision boundaries in a large margin classifiers setting leading to a non-convex optimization problem. In this paper, we propose a novel large margin classifier based on horospherical decision boundaries that leads to a geodesically convex optimization problem that can be optimized using any Riemannian gradient descent technique guaranteeing a globally optimal solution. We present several experiments depicting the competitive performance of our classifier in comparison to SOTA.
    Efficient Adversarial Input Generation via Neural Net Patching. (arXiv:2211.16808v2 [cs.LG] UPDATED)
    The generation of adversarial inputs has become a crucial issue in establishing the robustness and trustworthiness of deep neural nets, especially when they are used in safety-critical application domains such as autonomous vehicles and precision medicine. However, the problem poses multiple practical challenges, including scalability issues owing to large-sized networks, and the generation of adversarial inputs that lack important qualities such as naturalness and output-impartiality. This problem shares its end goal with the task of patching neural nets where small changes in some of the network's weights need to be discovered so that upon applying these changes, the modified net produces the desirable output for a given set of inputs. We exploit this connection by proposing to obtain an adversarial input from a patch, with the underlying observation that the effect of changing the weights can also be brought about by changing the inputs instead. Thus, this paper presents a novel way to generate input perturbations that are adversarial for a given network by using an efficient network patching technique. We note that the proposed method is significantly more effective than the prior state-of-the-art techniques.
    M-OFDFT: Overcoming the Barrier of Orbital-Free Density Functional Theory for Molecular Systems Using Deep Learning. (arXiv:2309.16578v1 [stat.ML])
    Orbital-free density functional theory (OFDFT) is a quantum chemistry formulation that has a lower cost scaling than the prevailing Kohn-Sham DFT, which is increasingly desired for contemporary molecular research. However, its accuracy is limited by the kinetic energy density functional, which is notoriously hard to approximate for non-periodic molecular systems. In this work, we propose M-OFDFT, an OFDFT approach capable of solving molecular systems using a deep-learning functional model. We build the essential nonlocality into the model, which is made affordable by the concise density representation as expansion coefficients under an atomic basis. With techniques to address unconventional learning challenges therein, M-OFDFT achieves a comparable accuracy with Kohn-Sham DFT on a wide range of molecules untouched by OFDFT before. More attractively, M-OFDFT extrapolates well to molecules much larger than those in training, which unleashes the appealing scaling for studying large molecules including proteins, representing an advancement of the accuracy-efficiency trade-off frontier in quantum chemistry.
    D$^3$Fields: Dynamic 3D Descriptor Fields for Zero-Shot Generalizable Robotic Manipulation. (arXiv:2309.16118v1 [cs.RO])
    Scene representation has been a crucial design choice in robotic manipulation systems. An ideal representation should be 3D, dynamic, and semantic to meet the demands of diverse manipulation tasks. However, previous works often lack all three properties simultaneously. In this work, we introduce D$^3$Fields - dynamic 3D descriptor fields. These fields capture the dynamics of the underlying 3D environment and encode both semantic features and instance masks. Specifically, we project arbitrary 3D points in the workspace onto multi-view 2D visual observations and interpolate features derived from foundational models. The resulting fused descriptor fields allow for flexible goal specifications using 2D images with varied contexts, styles, and instances. To evaluate the effectiveness of these descriptor fields, we apply our representation to a wide range of robotic manipulation tasks in a zero-shot manner. Through extensive evaluation in both real-world scenarios and simulations, we demonstrate that D$^3$Fields are both generalizable and effective for zero-shot robotic manipulation tasks. In quantitative comparisons with state-of-the-art dense descriptors, such as Dense Object Nets and DINO, D$^3$Fields exhibit significantly better generalization abilities and manipulation accuracy.
    Resisting Backdoor Attacks in Federated Learning via Bidirectional Elections and Individual Perspective. (arXiv:2309.16456v1 [cs.LG])
    Existing approaches defend against backdoor attacks in federated learning (FL) mainly through a) mitigating the impact of infected models, or b) excluding infected models. The former negatively impacts model accuracy, while the latter usually relies on globally clear boundaries between benign and infected model updates. However, model updates are easy to be mixed and scattered throughout in reality due to the diverse distributions of local data. This work focuses on excluding infected models in FL. Unlike previous perspectives from a global view, we propose Snowball, a novel anti-backdoor FL framework through bidirectional elections from an individual perspective inspired by one principle deduced by us and two principles in FL and deep learning. It is characterized by a) bottom-up election, where each candidate model update votes to several peer ones such that a few model updates are elected as selectees for aggregation; and b) top-down election, where selectees progressively enlarge themselves through picking up from the candidates. We compare Snowball with state-of-the-art defenses to backdoor attacks in FL on five real-world datasets, demonstrating its superior resistance to backdoor attacks and slight impact on the accuracy of the global model.
    DeepPCR: Parallelizing Sequential Operations in Neural Networks. (arXiv:2309.16318v1 [cs.LG])
    Parallelization techniques have become ubiquitous for accelerating inference and training of deep neural networks. Despite this, several operations are still performed in a sequential manner. For instance, the forward and backward passes are executed layer-by-layer, and the output of diffusion models is produced by applying a sequence of denoising steps. This sequential approach results in a computational cost proportional to the number of steps involved, presenting a potential bottleneck as the number of steps increases. In this work, we introduce DeepPCR, a novel algorithm which parallelizes typically sequential operations used in inference and training of neural networks. DeepPCR is based on interpreting a sequence of $L$ steps as the solution of a specific system of equations, which we recover using the Parallel Cyclic Reduction algorithm. This reduces the complexity of computing the sequential operations from $\mathcal{O}(L)$ to $\mathcal{O}(\log_2L)$, thus yielding a speedup for large $L$. To verify the theoretical lower complexity of the algorithm, and to identify regimes for speedup, we test the effectiveness of DeepPCR in parallelizing the forward and backward pass in multi-layer perceptrons, and reach speedups of up to $30\times$ for forward and $200\times$ for backward pass. We additionally showcase the flexibility of DeepPCR by parallelizing training of ResNets with as many as 1024 layers, and generation in diffusion models, enabling up to $7\times$ faster training and $11\times$ faster generation, respectively, when compared to the sequential approach.
    CRIMED: Lower and Upper Bounds on Regret for Bandits with Unbounded Stochastic Corruption. (arXiv:2309.16563v1 [stat.ML])
    We investigate the regret-minimisation problem in a multi-armed bandit setting with arbitrary corruptions. Similar to the classical setup, the agent receives rewards generated independently from the distribution of the arm chosen at each time. However, these rewards are not directly observed. Instead, with a fixed $\varepsilon\in (0,\frac{1}{2})$, the agent observes a sample from the chosen arm's distribution with probability $1-\varepsilon$, or from an arbitrary corruption distribution with probability $\varepsilon$. Importantly, we impose no assumptions on these corruption distributions, which can be unbounded. In this setting, accommodating potentially unbounded corruptions, we establish a problem-dependent lower bound on regret for a given family of arm distributions. We introduce CRIMED, an asymptotically-optimal algorithm that achieves the exact lower bound on regret for bandits with Gaussian distributions with known variance. Additionally, we provide a finite-sample analysis of CRIMED's regret performance. Notably, CRIMED can effectively handle corruptions with $\varepsilon$ values as high as $\frac{1}{2}$. Furthermore, we develop a tight concentration result for medians in the presence of arbitrary corruptions, even with $\varepsilon$ values up to $\frac{1}{2}$, which may be of independent interest. We also discuss an extension of the algorithm for handling misspecification in Gaussian model.
    E2Net: Resource-Efficient Continual Learning with Elastic Expansion Network. (arXiv:2309.16117v1 [cs.LG])
    Continual Learning methods are designed to learn new tasks without erasing previous knowledge. However, Continual Learning often requires massive computational power and storage capacity for satisfactory performance. In this paper, we propose a resource-efficient continual learning method called the Elastic Expansion Network (E2Net). Leveraging core subnet distillation and precise replay sample selection, E2Net achieves superior average accuracy and diminished forgetting within the same computational and storage constraints, all while minimizing processing time. In E2Net, we propose Representative Network Distillation to identify the representative core subnet by assessing parameter quantity and output similarity with the working network, distilling analogous subnets within the working network to mitigate reliance on rehearsal buffers and facilitating knowledge transfer across previous tasks. To enhance storage resource utilization, we then propose Subnet Constraint Experience Replay to optimize rehearsal efficiency through a sample storage strategy based on the structures of representative networks. Extensive experiments conducted predominantly on cloud environments with diverse datasets and also spanning the edge environment demonstrate that E2Net consistently outperforms state-of-the-art methods. In addition, our method outperforms competitors in terms of both storage and computational requirements.
    Geodesic Regression Characterizes 3D Shape Changes in the Female Brain During Menstruation. (arXiv:2309.16662v1 [cs.CV])
    Women are at higher risk of Alzheimer's and other neurological diseases after menopause, and yet research connecting female brain health to sex hormone fluctuations is limited. We seek to investigate this connection by developing tools that quantify 3D shape changes that occur in the brain during sex hormone fluctuations. Geodesic regression on the space of 3D discrete surfaces offers a principled way to characterize the evolution of a brain's shape. However, in its current form, this approach is too computationally expensive for practical use. In this paper, we propose approximation schemes that accelerate geodesic regression on shape spaces of 3D discrete surfaces. We also provide rules of thumb for when each approximation can be used. We test our approach on synthetic data to quantify the speed-accuracy trade-off of these approximations and show that practitioners can expect very significant speed-up while only sacrificing little accuracy. Finally, we apply the method to real brain shape data and produce the first characterization of how the female hippocampus changes shape during the menstrual cycle as a function of progesterone: a characterization made (practically) possible by our approximation schemes. Our work paves the way for comprehensive, practical shape analyses in the fields of bio-medicine and computer vision. Our implementation is publicly available on GitHub: https://github.com/bioshape-lab/my28brains.
    Hierarchical Network Data Analytics Framework for B5G Network Automation: Design and Implementation. (arXiv:2309.16269v1 [cs.NI])
    5G introduced modularized network functions (NFs) to support emerging services in a more flexible and elastic manner. To mitigate the complexity in such modularized NF management, automated network operation and management are indispensable, and thus the 3rd generation partnership project (3GPP) has introduced a network data analytics function (NWDAF). However, a conventional NWDAF needs to conduct both inference and training tasks, and thus it is difficult to provide the analytics results to NFs in a timely manner for an increased number of analytics requests. In this article, we propose a hierarchical network data analytics framework (H-NDAF) where inference tasks are distributed to multiple leaf NWDAFs and training tasks are conducted at the root NWDAF. Extensive simulation results using open-source software (i.e., free5GC) demonstrate that H-NDAF can provide sufficiently accurate analytics and faster analytics provision time compared to the conventional NWDAF.
    Projection based fuzzy least squares twin support vector machine for class imbalance problems. (arXiv:2309.15886v1 [cs.LG])
    Class imbalance is a major problem in many real world classification tasks. Due to the imbalance in the number of samples, the support vector machine (SVM) classifier gets biased toward the majority class. Furthermore, these samples are often observed with a certain degree of noise. Therefore, to remove these problems we propose a novel fuzzy based approach to deal with class imbalanced as well noisy datasets. We propose two approaches to address these problems. The first approach is based on the intuitionistic fuzzy membership, termed as robust energy-based intuitionistic fuzzy least squares twin support vector machine (IF-RELSTSVM). Furthermore, we introduce the concept of hyperplane-based fuzzy membership in our second approach, where the final classifier is termed as robust energy-based fuzzy least square twin support vector machine (F-RELSTSVM). By using this technique, the membership values are based on a projection based approach, where the data points are projected on the hyperplanes. The performance of the proposed algorithms is evaluated on several benchmark and synthetic datasets. The experimental results show that the proposed IF-RELSTSVM and F-RELSTSVM models outperform the baseline algorithms. Statistical tests are performed to check the significance of the proposed algorithms. The results show the applicability of the proposed algorithms on noisy as well as imbalanced datasets.
    Channel Vision Transformers: An Image Is Worth C x 16 x 16 Words. (arXiv:2309.16108v1 [cs.CV])
    Vision Transformer (ViT) has emerged as a powerful architecture in the realm of modern computer vision. However, its application in certain imaging fields, such as microscopy and satellite imaging, presents unique challenges. In these domains, images often contain multiple channels, each carrying semantically distinct and independent information. Furthermore, the model must demonstrate robustness to sparsity in input channels, as they may not be densely available during training or testing. In this paper, we propose a modification to the ViT architecture that enhances reasoning across the input channels and introduce Hierarchical Channel Sampling (HCS) as an additional regularization technique to ensure robustness when only partial channels are presented during test time. Our proposed model, ChannelViT, constructs patch tokens independently from each input channel and utilizes a learnable channel embedding that is added to the patch tokens, similar to positional embeddings. We evaluate the performance of ChannelViT on ImageNet, JUMP-CP (microscopy cell imaging), and So2Sat (satellite imaging). Our results show that ChannelViT outperforms ViT on classification tasks and generalizes well, even when a subset of input channels is used during testing. Across our experiments, HCS proves to be a powerful regularizer, independent of the architecture employed, suggesting itself as a straightforward technique for robust ViT training. Lastly, we find that ChannelViT generalizes effectively even when there is limited access to all channels during training, highlighting its potential for multi-channel imaging under real-world conditions with sparse sensors.
    Unified Long-Term Time-Series Forecasting Benchmark. (arXiv:2309.15946v1 [cs.LG])
    In order to support the advancement of machine learning methods for predicting time-series data, we present a comprehensive dataset designed explicitly for long-term time-series forecasting. We incorporate a collection of datasets obtained from diverse, dynamic systems and real-life records. Each dataset is standardized by dividing it into training and test trajectories with predetermined lookback lengths. We include trajectories of length up to $2000$ to ensure a reliable evaluation of long-term forecasting capabilities. To determine the most effective model in diverse scenarios, we conduct an extensive benchmarking analysis using classical and state-of-the-art models, namely LSTM, DeepAR, NLinear, N-Hits, PatchTST, and LatentODE. Our findings reveal intriguing performance comparisons among these models, highlighting the dataset-dependent nature of model effectiveness. Notably, we introduce a custom latent NLinear model and enhance DeepAR with a curriculum learning phase. Both consistently outperform their vanilla counterparts.
    Max-Sliced Mutual Information. (arXiv:2309.16200v1 [cs.LG])
    Quantifying the dependence between high-dimensional random variables is central to statistical learning and inference. Two classical methods are canonical correlation analysis (CCA), which identifies maximally correlated projected versions of the original variables, and Shannon's mutual information, which is a universal dependence measure that also captures high-order dependencies. However, CCA only accounts for linear dependence, which may be insufficient for certain applications, while mutual information is often infeasible to compute/estimate in high dimensions. This work proposes a middle ground in the form of a scalable information-theoretic generalization of CCA, termed max-sliced mutual information (mSMI). mSMI equals the maximal mutual information between low-dimensional projections of the high-dimensional variables, which reduces back to CCA in the Gaussian case. It enjoys the best of both worlds: capturing intricate dependencies in the data while being amenable to fast computation and scalable estimation from samples. We show that mSMI retains favorable structural properties of Shannon's mutual information, like variational forms and identification of independence. We then study statistical estimation of mSMI, propose an efficiently computable neural estimator, and couple it with formal non-asymptotic error bounds. We present experiments that demonstrate the utility of mSMI for several tasks, encompassing independence testing, multi-view representation learning, algorithmic fairness, and generative modeling. We observe that mSMI consistently outperforms competing methods with little-to-no computational overhead.
    Symbolic Imitation Learning: From Black-Box to Explainable Driving Policies. (arXiv:2309.16025v1 [cs.LG])
    Current methods of imitation learning (IL), primarily based on deep neural networks, offer efficient means for obtaining driving policies from real-world data but suffer from significant limitations in interpretability and generalizability. These shortcomings are particularly concerning in safety-critical applications like autonomous driving. In this paper, we address these limitations by introducing Symbolic Imitation Learning (SIL), a groundbreaking method that employs Inductive Logic Programming (ILP) to learn driving policies which are transparent, explainable and generalisable from available datasets. Utilizing the real-world highD dataset, we subject our method to a rigorous comparative analysis against prevailing neural-network-based IL methods. Our results demonstrate that SIL not only enhances the interpretability of driving policies but also significantly improves their applicability across varied driving situations. Hence, this work offers a novel pathway to more reliable and safer autonomous driving systems, underscoring the potential of integrating ILP into the domain of IL.
    Label Augmentation Method for Medical Landmark Detection in Hip Radiograph Images. (arXiv:2309.16066v1 [cs.LG])
    This work reports the empirical performance of an automated medical landmark detection method for predict clinical markers in hip radiograph images. Notably, the detection method was trained using a label-only augmentation scheme; our results indicate that this form of augmentation outperforms traditional data augmentation and produces highly sample efficient estimators. We train a generic U-Net-based architecture under a curriculum consisting of two phases: initially relaxing the landmarking task by enlarging the label points to regions, then gradually eroding these label regions back to the base task. We measure the benefits of this approach on six datasets of radiographs with gold-standard expert annotations.  ( 2 min )
    Imbalanced Data Stream Classification using Dynamic Ensemble Selection. (arXiv:2309.09175v2 [cs.LG] UPDATED)
    Modern streaming data categorization faces significant challenges from concept drift and class imbalanced data. This negatively impacts the output of the classifier, leading to improper classification. Furthermore, other factors such as the overlapping of multiple classes limit the extent of the correctness of the output. This work proposes a novel framework for integrating data pre-processing and dynamic ensemble selection, by formulating the classification framework for the nonstationary drifting imbalanced data stream, which employs the data pre-processing and dynamic ensemble selection techniques. The proposed framework was evaluated using six artificially generated data streams with differing imbalance ratios in combination with two different types of concept drifts. Each stream is composed of 200 chunks of 500 objects described by eight features and contains five concept drifts. Seven pre-processing techniques and two dynamic ensemble selection methods were considered. According to experimental results, data pre-processing combined with Dynamic Ensemble Selection techniques significantly delivers more accuracy when dealing with imbalanced data streams.
    Just Noticeable Difference Modeling for Face Recognition System. (arXiv:2209.05856v2 [cs.CV] UPDATED)
    High-quality face images are required to guarantee the stability and reliability of automatic face recognition (FR) systems in surveillance and security scenarios. However, a massive amount of face data is usually compressed before being analyzed due to limitations on transmission or storage. The compressed images may lose the powerful identity information, resulting in the performance degradation of the FR system. Herein, we make the first attempt to study just noticeable difference (JND) for the FR system, which can be defined as the maximum distortion that the FR system cannot notice. More specifically, we establish a JND dataset including 3530 original images and 137,670 compressed images generated by advanced reference encoding/decoding software based on the Versatile Video Coding (VVC) standard (VTM-15.0). Subsequently, we develop a novel JND prediction model to directly infer JND images for the FR system. In particular, in order to maximum redundancy removal without impairment of robust identity information, we apply the encoder with multiple feature extraction and attention-based feature decomposition modules to progressively decompose face features into two uncorrelated components, i.e., identity and residual features, via self-supervised learning. Then, the residual feature is fed into the decoder to generate the residual map. Finally, the predicted JND map is obtained by subtracting the residual map from the original image. Experimental results have demonstrated that the proposed model achieves higher accuracy of JND map prediction compared with the state-of-the-art JND models, and is capable of saving more bits while maintaining the performance of the FR system compared with VTM-15.0.
    High Perceptual Quality Wireless Image Delivery with Denoising Diffusion Models. (arXiv:2309.15889v1 [eess.IV])
    We consider the image transmission problem over a noisy wireless channel via deep learning-based joint source-channel coding (DeepJSCC) along with a denoising diffusion probabilistic model (DDPM) at the receiver. Specifically, we are interested in the perception-distortion trade-off in the practical finite block length regime, in which separate source and channel coding can be highly suboptimal. We introduce a novel scheme that utilizes the range-null space decomposition of the target image. We transmit the range-space of the image after encoding and employ DDPM to progressively refine its null space contents. Through extensive experiments, we demonstrate significant improvements in distortion and perceptual quality of reconstructed images compared to standard DeepJSCC and the state-of-the-art generative learning-based method. We will publicly share our source code to facilitate further research and reproducibility.
    Classical-to-quantum convolutional neural network transfer learning. (arXiv:2208.14708v2 [quant-ph] UPDATED)
    Machine learning using quantum convolutional neural networks (QCNNs) has demonstrated success in both quantum and classical data classification. In previous studies, QCNNs attained a higher classification accuracy than their classical counterparts under the same training conditions in the few-parameter regime. However, the general performance of large-scale quantum models is difficult to examine because of the limited size of quantum circuits, which can be reliably implemented in the near future. We propose transfer learning as an effective strategy for utilizing small QCNNs in the noisy intermediate-scale quantum era to the full extent. In the classical-to-quantum transfer learning framework, a QCNN can solve complex classification problems without requiring a large-scale quantum circuit by utilizing a pre-trained classical convolutional neural network (CNN). We perform numerical simulations of QCNN models with various sets of quantum convolution and pooling operations for MNIST data classification under transfer learning, in which a classical CNN is trained with Fashion-MNIST data. The results show that transfer learning from classical to quantum CNN performs considerably better than purely classical transfer learning models under similar training conditions.
    Discouraging posterior collapse in hierarchical Variational Autoencoders using context. (arXiv:2302.09976v2 [cs.LG] UPDATED)
    Hierarchical Variational Autoencoders (VAEs) are among the most popular likelihood-based generative models. There is a consensus that the top-down hierarchical VAEs allow effective learning of deep latent structures and avoid problems like posterior collapse. Here, we show that this is not necessarily the case, and the problem of collapsing posteriors remains. To discourage this issue, we propose a deep hierarchical VAE with a context on top. Specifically, we use a Discrete Cosine Transform to obtain the last latent variable. In a series of experiments, we observe that the proposed modification allows us to achieve better utilization of the latent space and does not harm the model's generative abilities.
    Bridging Trustworthiness and Open-World Learning: An Exploratory Neural Approach for Enhancing Interpretability, Generalization, and Robustness. (arXiv:2308.03666v3 [stat.ML] UPDATED)
    As researchers strive to narrow the gap between machine intelligence and human through the development of artificial intelligence technologies, it is imperative that we recognize the critical importance of trustworthiness in open-world, which has become ubiquitous in all aspects of daily life for everyone. However, several challenges may create a crisis of trust in current artificial intelligence systems that need to be bridged: 1) Insufficient explanation of predictive results; 2) Inadequate generalization for learning models; 3) Poor adaptability to uncertain environments. Consequently, we explore a neural program to bridge trustworthiness and open-world learning, extending from single-modal to multi-modal scenarios for readers. 1) To enhance design-level interpretability, we first customize trustworthy networks with specific physical meanings; 2) We then design environmental well-being task-interfaces via flexible learning regularizers for improving the generalization of trustworthy learning; 3) We propose to increase the robustness of trustworthy learning by integrating open-world recognition losses with agent mechanisms. Eventually, we enhance various trustworthy properties through the establishment of design-level explainability, environmental well-being task-interfaces and open-world recognition programs. These designed open-world protocols are applicable across a wide range of surroundings, under open-world multimedia recognition scenarios with significant performance improvements observed.
    Latent Graph Powered Semi-Supervised Learning on Biomedical Tabular Data. (arXiv:2309.15757v2 [cs.LG] UPDATED)
    In the domain of semi-supervised learning, the current approaches insufficiently exploit the potential of considering inter-instance relationships among (un)labeled data. In this work, we address this limitation by providing an approach for inferring latent graphs that capture the intrinsic data relationships. By leveraging graph-based representations, our approach facilitates the seamless propagation of information throughout the graph, enabling the effective incorporation of global and local knowledge. Through evaluations on biomedical tabular datasets, we compare the capabilities of our approach to other contemporary methods. Our work demonstrates the significance of inter-instance relationship discovery as practical means for constructing robust latent graphs to enhance semi-supervised learning techniques. Our method achieves state-of-the-art results on three biomedical datasets.
    Enhancing Sharpness-Aware Optimization Through Variance Suppression. (arXiv:2309.15639v2 [cs.LG] UPDATED)
    Sharpness-aware minimization (SAM) has well documented merits in enhancing generalization of deep neural networks, even without sizable data augmentation. Embracing the geometry of the loss function, where neighborhoods of 'flat minima' heighten generalization ability, SAM seeks 'flat valleys' by minimizing the maximum loss caused by an adversary perturbing parameters within the neighborhood. Although critical to account for sharpness of the loss function, such an 'over-friendly adversary' can curtail the outmost level of generalization. The novel approach of this contribution fosters stabilization of adversaries through variance suppression (VaSSO) to avoid such friendliness. VaSSO's provable stability safeguards its numerical improvement over SAM in model-agnostic tasks, including image classification and machine translation. In addition, experiments confirm that VaSSO endows SAM with robustness against high levels of label noise.
    Deep learning for bias-correcting CMIP6-class Earth system models. (arXiv:2301.01253v3 [physics.ao-ph] UPDATED)
    The accurate representation of precipitation in Earth system models (ESMs) is crucial for reliable projections of the ecological and socioeconomic impacts in response to anthropogenic global warming. The complex cross-scale interactions of processes that produce precipitation are challenging to model, however, inducing potentially strong biases in ESM fields, especially regarding extremes. State-of-the-art bias correction methods only address errors in the simulated frequency distributions locally at every individual grid cell. Improving unrealistic spatial patterns of the ESM output, which would require spatial context, has not been possible so far. Here, we show that a post-processing method based on physically constrained generative adversarial networks (cGANs) can correct biases of a state-of-the-art, CMIP6-class ESM both in local frequency distributions and in the spatial patterns at once. While our method improves local frequency distributions equally well as gold-standard bias-adjustment frameworks, it strongly outperforms any existing methods in the correction of spatial patterns, especially in terms of the characteristic spatial intermittency of precipitation extremes.
    Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs. (arXiv:2309.05516v2 [cs.CL] UPDATED)
    Large Language Models (LLMs) have proven their exceptional capabilities in performing language-related tasks. However, their deployment poses significant challenges due to their considerable memory and storage requirements. In response to this issue, weight-only quantization, particularly 3 and 4-bit weight-only quantization, has emerged as one of the most viable solutions. As the number of bits decreases, the quantization grid broadens, thus emphasizing the importance of up and down rounding. While previous studies have demonstrated that fine-tuning up and down rounding with the addition of perturbations can enhance accuracy in some scenarios, our study is driven by the precise and limited boundary of these perturbations, where only the threshold for altering the rounding value is of significance. Consequently, we propose a concise and highly effective approach for optimizing the weight rounding task. Our method, named SignRound, involves lightweight block-wise tuning using signed gradient descent, enabling us to achieve outstanding results within 400 steps. SignRound competes impressively against recent methods without introducing additional inference overhead. The source code will be publicly available at \url{https://github.com/intel/neural-compressor} soon.
    DIRA: Dynamic Domain Incremental Regularised Adaptation. (arXiv:2205.00147v4 [cs.LG] UPDATED)
    Autonomous systems (AS) often use Deep Neural Network (DNN) classifiers to allow them to operate in complex, high-dimensional, non-linear, and dynamically changing environments. Due to the complexity of these environments, DNN classifiers may output misclassifications during operation when they face domains not identified during development. Removing a system from operation for retraining becomes impractical as the number of such AS increases. To increase AS reliability and overcome this limitation, DNN classifiers need to have the ability to adapt during operation when faced with different operational domains using a few samples (e.g. 100 samples). However, retraining DNNs on a few samples is known to cause catastrophic forgetting. In this paper, we introduce Dynamic Incremental Regularised Adaptation (DIRA), a framework for operational domain adaption of DNN classifiers using regularisation techniques to overcome catastrophic forgetting and achieve adaptation when retraining using a few samples of the target domain. Our approach shows improvements on different image classification benchmarks aimed at evaluating robustness to distribution shifts (e.g.CIFAR-10C/100C, ImageNet-C), and produces state-of-the-art performance in comparison with other frameworks from the literature.
    STAG: Enabling Low Latency and Low Staleness of GNN-based Services with Dynamic Graphs. (arXiv:2309.15875v1 [cs.LG])
    Many emerging user-facing services adopt Graph Neural Networks (GNNs) to improve serving accuracy. When the graph used by a GNN model changes, representations (embedding) of nodes in the graph should be updated accordingly. However, the node representation update is too slow, resulting in either long response latency of user queries (the inference is performed after the update completes) or high staleness problem (the inference is performed based on stale data). Our in-depth analysis shows that the slow update is mainly due to neighbor explosion problem in graphs and duplicated computation. Based on such findings, we propose STAG, a GNN serving framework that enables low latency and low staleness of GNN-based services. It comprises a collaborative serving mechanism and an additivity-based incremental propagation strategy. With the collaborative serving mechanism, only part of node representations are updated during the update phase, and the final representations are calculated in the inference phase. It alleviates the neighbor explosion problem. The additivity-based incremental propagation strategy reuses intermediate data during the update phase, eliminating duplicated computation problem. Experimental results show that STAG accelerates the update phase by 1.3x~90.1x, and greatly reduces staleness time with a slight increase in response latency.
    IBIA: An Incremental Build-Infer-Approximate Framework for Approximate Inference of Partition Function. (arXiv:2304.06366v2 [cs.AI] UPDATED)
    Exact computation of the partition function is known to be intractable, necessitating approximate inference techniques. Existing methods for approximate inference are slow to converge for many benchmarks. The control of accuracy-complexity trade-off is also non-trivial in many of these methods. We propose a novel incremental build-infer-approximate (IBIA) framework for approximate inference that addresses these issues. In this framework, the probabilistic graphical model is converted into a sequence of clique tree forests (SCTF) with bounded clique sizes. We show that the SCTF can be used to efficiently compute the partition function. We propose two new algorithms which are used to construct the SCTF and prove the correctness of both. The first is an algorithm for incremental construction of CTFs that is guaranteed to give a valid CTF with bounded clique sizes and the second is an approximation algorithm that takes a calibrated CTF as input and yields a valid and calibrated CTF with reduced clique sizes as the output. We have evaluated our method using several benchmark sets from recent UAI competitions and our results show good accuracies with competitive runtimes.
  • Open

    Learning Interpretable Characteristic Kernels via Decision Forests. (arXiv:1812.00029v3 [stat.ML] UPDATED)
    Decision forests are widely used for classification and regression tasks. A lesser known property of tree-based methods is that one can construct a proximity matrix from the tree(s), and these proximity matrices are induced kernels. While there has been extensive research on the applications and properties of kernels, there is relatively little research on kernels induced by decision forests. We construct Kernel Mean Embedding Random Forests (KMERF), which induce kernels from random trees and/or forests using leaf-node proximity. We introduce the notion of an asymptotically characteristic kernel, and prove that KMERF kernels are asymptotically characteristic for both discrete and continuous data. Because KMERF is data-adaptive, we suspected it would outperform kernels selected a priori on finite sample data. We illustrate that KMERF nearly dominates current state-of-the-art kernel-based tests across a diverse range of high-dimensional two-sample and independence testing settings. Furthermore, our forest-based approach is interpretable, and provides feature importance metrics that readily distinguish important dimensions, unlike other high-dimensional non-parametric testing procedures. Hence, this work demonstrates the decision forest-based kernel can be more powerful and more interpretable than existing methods, flying in the face of conventional wisdom of the trade-off between the two.
    Patch-level Neighborhood Interpolation: A General and Effective Graph-based Regularization Strategy. (arXiv:1911.09307v2 [cs.LG] UPDATED)
    Regularization plays a crucial role in machine learning models, especially for deep neural networks. The existing regularization techniques mainly rely on the i.i.d. assumption and only consider the knowledge from the current sample, without the leverage of the neighboring relationship between samples. In this work, we propose a general regularizer called \textbf{Patch-level Neighborhood Interpolation~(Pani)} that conducts a non-local representation in the computation of networks. Our proposal explicitly constructs patch-level graphs in different layers and then linearly interpolates neighborhood patch features, serving as a general and effective regularization strategy. Further, we customize our approach into two kinds of popular regularization methods, namely Virtual Adversarial Training (VAT) and MixUp as well as its variants. The first derived \textbf{Pani VAT} presents a novel way to construct non-local adversarial smoothness by employing patch-level interpolated perturbations. The second derived \textbf{Pani MixUp} method extends the MixUp, and achieves superiority over MixUp and competitive performance over state-of-the-art variants of MixUp method with a significant advantage in computational efficiency. Extensive experiments have verified the effectiveness of our Pani approach in both supervised and semi-supervised settings.
    Nonparametric plug-in classifier for multiclass classification of S.D.E. paths. (arXiv:2212.10259v2 [math.ST] UPDATED)
    We study the multiclass classification problem where the features come from the mixture of time-homogeneous diffusions. Specifically, the classes are discriminated by their drift functions while the diffusion coefficient is common to all classes and unknown. In this framework, we build a plug-in classifier which relies on nonparametric estimators of the drift and diffusion functions. We first establish the consistency of our classification procedure under mild assumptions and then provide rates of cnvergence under different set of assumptions. Finally, a numerical study supports our theoretical findings.
    Asset Bundling for Wind Power Forecasting. (arXiv:2309.16492v1 [stat.ME])
    The growing penetration of intermittent, renewable generation in US power grids, especially wind and solar generation, results in increased operational uncertainty. In that context, accurate forecasts are critical, especially for wind generation, which exhibits large variability and is historically harder to predict. To overcome this challenge, this work proposes a novel Bundle-Predict-Reconcile (BPR) framework that integrates asset bundling, machine learning, and forecast reconciliation techniques. The BPR framework first learns an intermediate hierarchy level (the bundles), then predicts wind power at the asset, bundle, and fleet level, and finally reconciles all forecasts to ensure consistency. This approach effectively introduces an auxiliary learning task (predicting the bundle-level time series) to help the main learning tasks. The paper also introduces new asset-bundling criteria that capture the spatio-temporal dynamics of wind power time series. Extensive numerical experiments are conducted on an industry-size dataset of 283 wind farms in the MISO footprint. The experiments consider short-term and day-ahead forecasts, and evaluates a large variety of forecasting models that include weather predictions as covariates. The results demonstrate the benefits of BPR, which consistently and significantly improves forecast accuracy over baselines, especially at the fleet level.
    High-dimensional robust regression under heavy-tailed data: Asymptotics and Universality. (arXiv:2309.16476v1 [math.ST])
    We investigate the high-dimensional properties of robust regression estimators in the presence of heavy-tailed contamination of both the covariates and response functions. In particular, we provide a sharp asymptotic characterisation of M-estimators trained on a family of elliptical covariate and noise data distributions including cases where second and higher moments do not exist. We show that, despite being consistent, the Huber loss with optimally tuned location parameter $\delta$ is suboptimal in the high-dimensional regime in the presence of heavy-tailed noise, highlighting the necessity of further regularisation to achieve optimal performance. This result also uncovers the existence of a curious transition in $\delta$ as a function of the sample complexity and contamination. Moreover, we derive the decay rates for the excess risk of ridge regression. We show that, while it is both optimal and universal for noise distributions with finite second moment, its decay rate can be considerably faster when the covariates' second moment does not exist. Finally, we show that our formulas readily generalise to a richer family of models and data distributions, such as generalised linear estimation with arbitrary convex regularisation trained on mixture models.
    Flexible and efficient spatial extremes emulation via variational autoencoders. (arXiv:2307.08079v2 [stat.ML] UPDATED)
    Many real-world processes have complex tail dependence structures that cannot be characterized using classical Gaussian processes. More flexible spatial extremes models exhibit appealing extremal dependence properties but are often exceedingly prohibitive to fit and simulate from in high dimensions. In this paper, we develop a new spatial extremes model that has flexible and non-stationary dependence properties, and we integrate it in the encoding-decoding structure of a variational autoencoder (XVAE), whose parameters are estimated via variational Bayes combined with deep learning. The XVAE can be used as a spatio-temporal emulator that characterizes the distribution of potential mechanistic model output states and produces outputs that have the same statistical properties as the inputs, especially in the tail. As an aside, our approach also provides a novel way of making fast inference with complex extreme-value processes. Through extensive simulation studies, we show that our XVAE is substantially more time-efficient than traditional Bayesian inference while also outperforming many spatial extremes models with a stationary dependence structure. To further demonstrate the computational power of the XVAE, we analyze a high-resolution satellite-derived dataset of sea surface temperature in the Red Sea, which includes 30 years of daily measurements at 16703 grid cells. We find that the extremal dependence strength is weaker in the interior of Red Sea and it has decreased slightly over time.
    Unsupervised Fact Verification by Language Model Distillation. (arXiv:2309.16540v1 [cs.CL])
    Unsupervised fact verification aims to verify a claim using evidence from a trustworthy knowledge base without any kind of data annotation. To address this challenge, algorithms must produce features for every claim that are both semantically meaningful, and compact enough to find a semantic alignment with the source information. In contrast to previous work, which tackled the alignment problem by learning over annotated corpora of claims and their corresponding labels, we propose SFAVEL (Self-supervised Fact Verification via Language Model Distillation), a novel unsupervised framework that leverages pre-trained language models to distil self-supervised features into high-quality claim-fact alignments without the need for annotations. This is enabled by a novel contrastive loss function that encourages features to attain high-quality claim and evidence alignments whilst preserving the semantic relationships across the corpora. Notably, we present results that achieve a new state-of-the-art on the standard FEVER fact verification benchmark (+8% accuracy) with linear evaluation.  ( 2 min )
    Depthwise Hyperparameter Transfer in Residual Networks: Dynamics and Scaling Limit. (arXiv:2309.16620v1 [stat.ML])
    The cost of hyperparameter tuning in deep learning has been rising with model sizes, prompting practitioners to find new tuning methods using a proxy of smaller networks. One such proposal uses $\mu$P parameterized networks, where the optimal hyperparameters for small width networks transfer to networks with arbitrarily large width. However, in this scheme, hyperparameters do not transfer across depths. As a remedy, we study residual networks with a residual branch scale of $1/\sqrt{\text{depth}}$ in combination with the $\mu$P parameterization. We provide experiments demonstrating that residual architectures including convolutional ResNets and Vision Transformers trained with this parameterization exhibit transfer of optimal hyperparameters across width and depth on CIFAR-10 and ImageNet. Furthermore, our empirical findings are supported and motivated by theory. Using recent developments in the dynamical mean field theory (DMFT) description of neural network learning dynamics, we show that this parameterization of ResNets admits a well-defined feature learning joint infinite-width and infinite-depth limit and show convergence of finite-size network dynamics towards this limit.
    Transport map unadjusted Langevin algorithms: learning and discretizing perturbed samplers. (arXiv:2302.07227v3 [stat.ME] UPDATED)
    Langevin dynamics are widely used in sampling high-dimensional, non-Gaussian distributions whose densities are known up to a normalizing constant. In particular, there is strong interest in unadjusted Langevin algorithms (ULA), which directly discretize Langevin dynamics to estimate expectations over the target distribution. We study the use of transport maps that approximately normalize a target distribution as a way to precondition and accelerate the convergence of Langevin dynamics. We show that in continuous time, when a transport map is applied to Langevin dynamics, the result is a Riemannian manifold Langevin dynamics (RMLD) with metric defined by the transport map. We also show that applying a transport map to an irreversibly-perturbed ULA results in a geometry-informed irreversible perturbation (GiIrr) of the original dynamics. These connections suggest more systematic ways of learning metrics and perturbations, and also yield alternative discretizations of the RMLD described by the map, which we study. Under appropriate conditions, these discretized processes can be endowed with non-asymptotic bounds describing convergence to the target distribution in 2-Wasserstein distance. Illustrative numerical results complement our theoretical claims.
    A parsimonious, computationally efficient machine learning method for spatial regression. (arXiv:2309.16448v1 [stat.ML])
    We introduce the modified planar rotator method (MPRS), a physically inspired machine learning method for spatial/temporal regression. MPRS is a non-parametric model which incorporates spatial or temporal correlations via short-range, distance-dependent ``interactions'' without assuming a specific form for the underlying probability distribution. Predictions are obtained by means of a fully autonomous learning algorithm which employs equilibrium conditional Monte Carlo simulations. MPRS is able to handle scattered data and arbitrary spatial dimensions. We report tests on various synthetic and real-word data in one, two and three dimensions which demonstrate that the MPRS prediction performance (without parameter tuning) is competitive with standard interpolation methods such as ordinary kriging and inverse distance weighting. In particular, MPRS is a particularly effective gap-filling method for rough and non-Gaussian data (e.g., daily precipitation time series). MPRS shows superior computational efficiency and scalability for large samples. Massive data sets involving millions of nodes can be processed in a few seconds on a standard personal computer.  ( 2 min )
    A framework for paired-sample hypothesis testing for high-dimensional data. (arXiv:2309.16274v1 [stat.ML])
    The standard paired-sample testing approach in the multidimensional setting applies multiple univariate tests on the individual features, followed by p-value adjustments. Such an approach suffers when the data carry numerous features. A number of studies have shown that classification accuracy can be seen as a proxy for two-sample testing. However, neither theoretical foundations nor practical recipes have been proposed so far on how this strategy could be extended to multidimensional paired-sample testing. In this work, we put forward the idea that scoring functions can be produced by the decision rules defined by the perpendicular bisecting hyperplanes of the line segments connecting each pair of instances. Then, the optimal scoring function can be obtained by the pseudomedian of those rules, which we estimate by extending naturally the Hodges-Lehmann estimator. We accordingly propose a framework of a two-step testing procedure. First, we estimate the bisecting hyperplanes for each pair of instances and an aggregated rule derived through the Hodges-Lehmann estimator. The paired samples are scored by this aggregated rule to produce a unidimensional representation. Second, we perform a Wilcoxon signed-rank test on the obtained representation. Our experiments indicate that our approach has substantial performance gains in testing accuracy compared to the traditional multivariate and multiple testing, while at the same time estimates each feature's contribution to the final result.  ( 2 min )
    From Complexity to Clarity: Analytical Expressions of Deep Neural Network Weights via Clifford's Geometric Algebra and Convexity. (arXiv:2309.16512v1 [cs.LG])
    In this paper, we introduce a novel analysis of neural networks based on geometric (Clifford) algebra and convex optimization. We show that optimal weights of deep ReLU neural networks are given by the wedge product of training samples when trained with standard regularized loss. Furthermore, the training problem reduces to convex optimization over wedge product features, which encode the geometric structure of the training dataset. This structure is given in terms of signed volumes of triangles and parallelotopes generated by data vectors. The convex problem finds a small subset of samples via $\ell_1$ regularization to discover only relevant wedge product features. Our analysis provides a novel perspective on the inner workings of deep neural networks and sheds light on the role of the hidden layers.  ( 2 min )
    Computational Lower Bounds for Graphon Estimation via Low-degree Polynomials. (arXiv:2308.15728v2 [math.ST] UPDATED)
    Graphon estimation has been one of the most fundamental problems in network analysis and has received considerable attention in the past decade. From the statistical perspective, the minimax error rate of graphon estimation has been established by Gao et al (2015) for both stochastic block model (SBM) and nonparametric graphon estimation. The statistical optimal estimators are based on constrained least squares and have computational complexity exponential in the dimension. From the computational perspective, the best-known polynomial-time estimator is based on universal singular value thresholding (USVT), but it can only achieve a much slower estimation error rate than the minimax one. It is natural to wonder if such a gap is essential. The computational optimality of the USVT or the existence of a computational barrier in graphon estimation has been a long-standing open problem. In this work, we take the first step towards it and provide rigorous evidence for the computational barrier in graphon estimation via low-degree polynomials. Specifically, in both SBM and nonparametric graphon estimation, we show that for low-degree polynomial estimators, their estimation error rates cannot be significantly better than that of the USVT under a wide range of parameter regimes. Our results are proved based on the recent development of low-degree polynomials by Schramm and Wein (2022), while we overcome a few key challenges in applying it to the general graphon estimation problem. By leveraging our main results, we also provide a computational lower bound on the clustering error for community detection in SBM with a growing number of communities and this yields a new piece of evidence for the conjectured Kesten-Stigum threshold for efficient community recovery.
    A Primer on Bayesian Neural Networks: Review and Debates. (arXiv:2309.16314v1 [stat.ML])
    Neural networks have achieved remarkable performance across various problem domains, but their widespread applicability is hindered by inherent limitations such as overconfidence in predictions, lack of interpretability, and vulnerability to adversarial attacks. To address these challenges, Bayesian neural networks (BNNs) have emerged as a compelling extension of conventional neural networks, integrating uncertainty estimation into their predictive capabilities. This comprehensive primer presents a systematic introduction to the fundamental concepts of neural networks and Bayesian inference, elucidating their synergistic integration for the development of BNNs. The target audience comprises statisticians with a potential background in Bayesian methods but lacking deep learning expertise, as well as machine learners proficient in deep neural networks but with limited exposure to Bayesian statistics. We provide an overview of commonly employed priors, examining their impact on model behavior and performance. Additionally, we delve into the practical considerations associated with training and inference in BNNs. Furthermore, we explore advanced topics within the realm of BNN research, acknowledging the existence of ongoing debates and controversies. By offering insights into cutting-edge developments, this primer not only equips researchers and practitioners with a solid foundation in BNNs, but also illuminates the potential applications of this dynamic field. As a valuable resource, it fosters an understanding of BNNs and their promising prospects, facilitating further advancements in the pursuit of knowledge and innovation.
    Generative Semi-supervised Learning with Meta-Optimized Synthetic Samples. (arXiv:2309.16143v1 [cs.LG])
    Semi-supervised learning (SSL) is a promising approach for training deep classification models using labeled and unlabeled datasets. However, existing SSL methods rely on a large unlabeled dataset, which may not always be available in many real-world applications due to legal constraints (e.g., GDPR). In this paper, we investigate the research question: Can we train SSL models without real unlabeled datasets? Instead of using real unlabeled datasets, we propose an SSL method using synthetic datasets generated from generative foundation models trained on datasets containing millions of samples in diverse domains (e.g., ImageNet). Our main concepts are identifying synthetic samples that emulate unlabeled samples from generative foundation models and training classifiers using these synthetic samples. To achieve this, our method is formulated as an alternating optimization problem: (i) meta-learning of generative foundation models and (ii) SSL of classifiers using real labeled and synthetic unlabeled samples. For (i), we propose a meta-learning objective that optimizes latent variables to generate samples that resemble real labeled samples and minimize the validation loss. For (ii), we propose a simple unsupervised loss function that regularizes the feature extractors of classifiers to maximize the performance improvement obtained from synthetic samples. We confirm that our method outperforms baselines using generative foundation models on SSL. We also demonstrate that our methods outperform SSL using real unlabeled datasets in scenarios with extremely small amounts of labeled datasets. This suggests that synthetic samples have the potential to provide improvement gains more efficiently than real unlabeled data.  ( 3 min )
    Stackelberg Batch Policy Learning. (arXiv:2309.16188v1 [stat.ML])
    Batch reinforcement learning (RL) defines the task of learning from a fixed batch of data lacking exhaustive exploration. Worst-case optimality algorithms, which calibrate a value-function model class from logged experience and perform some type of pessimistic evaluation under the learned model, have emerged as a promising paradigm for batch RL. However, contemporary works on this stream have commonly overlooked the hierarchical decision-making structure hidden in the optimization landscape. In this paper, we adopt a game-theoretical viewpoint and model the policy learning diagram as a two-player general-sum game with a leader-follower structure. We propose a novel stochastic gradient-based learning algorithm: StackelbergLearner, in which the leader player updates according to the total derivative of its objective instead of the usual individual gradient, and the follower player makes individual updates and ensures transition-consistent pessimistic reasoning. The derived learning dynamic naturally lends StackelbergLearner to a game-theoretic interpretation and provides a convergence guarantee to differentiable Stackelberg equilibria. From a theoretical standpoint, we provide instance-dependent regret bounds with general function approximation, which shows that our algorithm can learn a best-effort policy that is able to compete against any comparator policy that is covered by batch data. Notably, our theoretical regret guarantees only require realizability without any data coverage and strong function approximation conditions, e.g., Bellman closedness, which is in contrast to prior works lacking such guarantees. Through comprehensive experiments, we find that our algorithm consistently performs as well or better as compared to state-of-the-art methods in batch RL benchmark and real-world datasets.
    Feature Normalization Prevents Collapse of Non-contrastive Learning Dynamics. (arXiv:2309.16109v1 [cs.LG])
    Contrastive learning is a self-supervised representation learning framework, where two positive views generated through data augmentation are made similar by an attraction force in a data representation space, while a repulsive force makes them far from negative examples. Non-contrastive learning, represented by BYOL and SimSiam, further gets rid of negative examples and improves computational efficiency. While learned representations may collapse into a single point due to the lack of the repulsive force at first sight, Tian et al. (2021) revealed through the learning dynamics analysis that the representations can avoid collapse if data augmentation is sufficiently stronger than regularization. However, their analysis does not take into account commonly-used feature normalization, a normalizer before measuring the similarity of representations, and hence excessively strong regularization may collapse the dynamics, which is an unnatural behavior under the presence of feature normalization. Therefore, we extend the previous theory based on the L2 loss by considering the cosine loss, which involves feature normalization. We show that the cosine loss induces sixth-order dynamics (while the L2 loss induces a third-order one), in which a stable equilibrium dynamically emerges even if there are only collapsed solutions with given initial parameters. Thus, we offer a new understanding that feature normalization plays an important role in robustly preventing the dynamics collapse.
    Generating Personalized Insulin Treatments Strategies with Deep Conditional Generative Time Series Models. (arXiv:2309.16521v1 [stat.ML])
    We propose a novel framework that combines deep generative time series models with decision theory for generating personalized treatment strategies. It leverages historical patient trajectory data to jointly learn the generation of realistic personalized treatment and future outcome trajectories through deep generative time series models. In particular, our framework enables the generation of novel multivariate treatment strategies tailored to the personalized patient history and trained for optimal expected future outcomes based on conditional expected utility maximization. We demonstrate our framework by generating personalized insulin treatment strategies and blood glucose predictions for hospitalized diabetes patients, showcasing the potential of our approach for generating improved personalized treatment strategies. Keywords: deep generative model, probabilistic decision support, personalized treatment generation, insulin and blood glucose prediction  ( 2 min )
    Bridging Trustworthiness and Open-World Learning: An Exploratory Neural Approach for Enhancing Interpretability, Generalization, and Robustness. (arXiv:2308.03666v3 [stat.ML] UPDATED)
    As researchers strive to narrow the gap between machine intelligence and human through the development of artificial intelligence technologies, it is imperative that we recognize the critical importance of trustworthiness in open-world, which has become ubiquitous in all aspects of daily life for everyone. However, several challenges may create a crisis of trust in current artificial intelligence systems that need to be bridged: 1) Insufficient explanation of predictive results; 2) Inadequate generalization for learning models; 3) Poor adaptability to uncertain environments. Consequently, we explore a neural program to bridge trustworthiness and open-world learning, extending from single-modal to multi-modal scenarios for readers. 1) To enhance design-level interpretability, we first customize trustworthy networks with specific physical meanings; 2) We then design environmental well-being task-interfaces via flexible learning regularizers for improving the generalization of trustworthy learning; 3) We propose to increase the robustness of trustworthy learning by integrating open-world recognition losses with agent mechanisms. Eventually, we enhance various trustworthy properties through the establishment of design-level explainability, environmental well-being task-interfaces and open-world recognition programs. These designed open-world protocols are applicable across a wide range of surroundings, under open-world multimedia recognition scenarios with significant performance improvements observed.
    Horospherical Decision Boundaries for Large Margin Classification in Hyperbolic Space. (arXiv:2302.06807v3 [stat.ML] UPDATED)
    Hyperbolic spaces have been quite popular in the recent past for representing hierarchically organized data. Further, several classification algorithms for data in these spaces have been proposed in the literature. These algorithms mainly use either hyperplanes or geodesics for decision boundaries in a large margin classifiers setting leading to a non-convex optimization problem. In this paper, we propose a novel large margin classifier based on horospherical decision boundaries that leads to a geodesically convex optimization problem that can be optimized using any Riemannian gradient descent technique guaranteeing a globally optimal solution. We present several experiments depicting the competitive performance of our classifier in comparison to SOTA.
    Exploiting Edge Features in Graphs with Fused Network Gromov-Wasserstein Distance. (arXiv:2309.16604v1 [stat.ML])
    Pairwise comparison of graphs is key to many applications in Machine learning ranging from clustering, kernel-based classification/regression and more recently supervised graph prediction. Distances between graphs usually rely on informative representations of these structured objects such as bag of substructures or other graph embeddings. A recently popular solution consists in representing graphs as metric measure spaces, allowing to successfully leverage Optimal Transport, which provides meaningful distances allowing to compare them: the Gromov-Wasserstein distances. However, this family of distances overlooks edge attributes, which are essential for many structured objects. In this work, we introduce an extension of Gromov-Wasserstein distance for comparing graphs whose both nodes and edges have features. We propose novel algorithms for distance and barycenter computation. We empirically show the effectiveness of the novel distance in learning tasks where graphs occur in either input space or output space, such as classification and graph prediction.
    Lossless Transformations and Excess Risk Bounds in Statistical Inference. (arXiv:2307.16735v2 [cs.IT] UPDATED)
    We study the excess minimum risk in statistical inference, defined as the difference between the minimum expected loss in estimating a random variable from an observed feature vector and the minimum expected loss in estimating the same random variable from a transformation (statistic) of the feature vector. After characterizing lossless transformations, i.e., transformations for which the excess risk is zero for all loss functions, we construct a partitioning test statistic for the hypothesis that a given transformation is lossless and show that for i.i.d. data the test is strongly consistent. More generally, we develop information-theoretic upper bounds on the excess risk that uniformly hold over fairly general classes of loss functions. Based on these bounds, we introduce the notion of a delta-lossless transformation and give sufficient conditions for a given transformation to be universally delta-lossless. Applications to classification, nonparametric regression, portfolio strategies, information bottleneck, and deep learning, are also surveyed.
    Dynamic Selection in Algorithmic Decision-making. (arXiv:2108.12547v3 [econ.EM] UPDATED)
    This paper identifies and addresses dynamic selection problems in online learning algorithms with endogenous data. In a contextual multi-armed bandit model, a novel bias (self-fulfilling bias) arises because the endogeneity of the data influences the choices of decisions, affecting the distribution of future data to be collected and analyzed. We propose an instrumental-variable-based algorithm to correct for the bias. It obtains true parameter values and attains low (logarithmic-like) regret levels. We also prove a central limit theorem for statistical inference. To establish the theoretical properties, we develop a general technique that untangles the interdependence between data and actions.
    Data Augmentation in the Underparameterized and Overparameterized Regimes. (arXiv:2202.09134v3 [cs.LG] UPDATED)
    We provide results that exactly quantify how data augmentation affects the variance and limiting distribution of estimates, and analyze several specific models in detail. The results confirm some observations made in machine learning practice, but also lead to unexpected findings: Data augmentation may increase rather than decrease the uncertainty of estimates, such as the empirical prediction risk. It can act as a regularizer, but fails to do so in certain high-dimensional problems, and it may shift the double-descent peak of an empirical risk. Overall, the analysis shows that several properties data augmentation has been attributed with are not either true or false, but rather depend on a combination of factors -- notably the data distribution, the properties of the estimator, and the interplay of sample size, number of augmentations, and dimension. Our main theoretical tool is a limit theorem for functions of randomly transformed, high-dimensional random vectors. The proof draws on work in probability on noise stability of functions of many variables.
    Beyond Reverse KL: Generalizing Direct Preference Optimization with Diverse Divergence Constraints. (arXiv:2309.16240v1 [cs.LG])
    The increasing capabilities of large language models (LLMs) raise opportunities for artificial general intelligence but concurrently amplify safety concerns, such as potential misuse of AI systems, necessitating effective AI alignment. Reinforcement Learning from Human Feedback (RLHF) has emerged as a promising pathway towards AI alignment but brings forth challenges due to its complexity and dependence on a separate reward model. Direct Preference Optimization (DPO) has been proposed as an alternative, and it remains equivalent to RLHF under the reverse KL regularization constraint. This paper presents $f$-DPO, a generalized approach to DPO by incorporating diverse divergence constraints. We show that under certain $f$-divergences, including Jensen-Shannon divergence, forward KL divergences and $\alpha$-divergences, the complex relationship between the reward and optimal policy can also be simplified by addressing the Karush-Kuhn-Tucker conditions. This eliminates the need for estimating the normalizing constant in the Bradley-Terry model and enables a tractable mapping between the reward function and the optimal policy. Our approach optimizes LLMs to align with human preferences in a more efficient and supervised manner under a broad set of divergence constraints. Empirically, adopting these divergences ensures a balance between alignment performance and generation diversity. Importantly, $f$-DPO outperforms PPO-based methods in divergence efficiency, and divergence constraints directly influence expected calibration error (ECE).
    M-OFDFT: Overcoming the Barrier of Orbital-Free Density Functional Theory for Molecular Systems Using Deep Learning. (arXiv:2309.16578v1 [stat.ML])
    Orbital-free density functional theory (OFDFT) is a quantum chemistry formulation that has a lower cost scaling than the prevailing Kohn-Sham DFT, which is increasingly desired for contemporary molecular research. However, its accuracy is limited by the kinetic energy density functional, which is notoriously hard to approximate for non-periodic molecular systems. In this work, we propose M-OFDFT, an OFDFT approach capable of solving molecular systems using a deep-learning functional model. We build the essential nonlocality into the model, which is made affordable by the concise density representation as expansion coefficients under an atomic basis. With techniques to address unconventional learning challenges therein, M-OFDFT achieves a comparable accuracy with Kohn-Sham DFT on a wide range of molecules untouched by OFDFT before. More attractively, M-OFDFT extrapolates well to molecules much larger than those in training, which unleashes the appealing scaling for studying large molecules including proteins, representing an advancement of the accuracy-efficiency trade-off frontier in quantum chemistry.
    Smooth Nested Simulation: Bridging Cubic and Square Root Convergence Rates in High Dimensions. (arXiv:2201.02958v5 [stat.ME] UPDATED)
    Nested simulation concerns estimating functionals of a conditional expectation via simulation. In this paper, we propose a new method based on kernel ridge regression to exploit the smoothness of the conditional expectation as a function of the multidimensional conditioning variable. Asymptotic analysis shows that the proposed method can effectively alleviate the curse of dimensionality on the convergence rate as the simulation budget increases, provided that the conditional expectation is sufficiently smooth. The smoothness bridges the gap between the cubic root convergence rate (that is, the optimal rate for the standard nested simulation) and the square root convergence rate (that is, the canonical rate for the standard Monte Carlo simulation). We demonstrate the performance of the proposed method via numerical examples from portfolio risk management and input uncertainty quantification.
    Is My Prediction Arbitrary? Confounding Effects of Variance in Fair Classification. (arXiv:2301.11562v5 [cs.LG] UPDATED)
    Variance in predictions across different trained models is a significant, under-explored source of error in fair classification. In practice, the variance on some data examples is so large that decisions can be effectively arbitrary. To investigate this problem, we take an experimental approach and make four overarching contributions: We 1) Define a metric called self-consistency, derived from variance, which we use as a proxy for measuring and reducing arbitrariness; 2) Develop an ensembling algorithm that abstains from classification when a prediction would be arbitrary; 3) Conduct the largest to-date empirical study of the role of variance (vis-a-vis self-consistency and arbitrariness) in fair classification; and, 4) Release a toolkit that makes the US Home Mortgage Disclosure Act (HMDA) datasets easily usable for future research. Altogether, our experiments reveal shocking insights about the reliability of conclusions on benchmark datasets. Most fairness classification benchmarks are close-to-fair when taking into account the amount of arbitrariness present in predictions -- before we even try to apply common fairness interventions. This finding calls into question the practical utility of common algorithmic fairness methods, and in turn suggests that we should fundamentally reconsider how we choose to measure fairness in machine learning.
    Selective Nonparametric Regression via Testing. (arXiv:2309.16412v1 [stat.ML])
    Prediction with the possibility of abstention (or selective prediction) is an important problem for error-critical machine learning applications. While well-studied in the classification setup, selective approaches to regression are much less developed. In this work, we consider the nonparametric heteroskedastic regression problem and develop an abstention procedure via testing the hypothesis on the value of the conditional variance at a given point. Unlike existing methods, the proposed one allows to account not only for the value of the variance itself but also for the uncertainty of the corresponding variance predictor. We prove non-asymptotic bounds on the risk of the resulting estimator and show the existence of several different convergence regimes. Theoretical analysis is illustrated with a series of experiments on simulated and real-world data.
    Cross-Prediction-Powered Inference. (arXiv:2309.16598v1 [stat.ML])
    While reliable data-driven decision-making hinges on high-quality labeled data, the acquisition of quality labels often involves laborious human annotations or slow and expensive scientific measurements. Machine learning is becoming an appealing alternative as sophisticated predictive techniques are being used to quickly and cheaply produce large amounts of predicted labels; e.g., predicted protein structures are used to supplement experimentally derived structures, predictions of socioeconomic indicators from satellite imagery are used to supplement accurate survey data, and so on. Since predictions are imperfect and potentially biased, this practice brings into question the validity of downstream inferences. We introduce cross-prediction: a method for valid inference powered by machine learning. With a small labeled dataset and a large unlabeled dataset, cross-prediction imputes the missing labels via machine learning and applies a form of debiasing to remedy the prediction inaccuracies. The resulting inferences achieve the desired error probability and are more powerful than those that only leverage the labeled data. Closely related is the recent proposal of prediction-powered inference, which assumes that a good pre-trained model is already available. We show that cross-prediction is consistently more powerful than an adaptation of prediction-powered inference in which a fraction of the labeled data is split off and used to train the model. Finally, we observe that cross-prediction gives more stable conclusions than its competitors; its confidence intervals typically have significantly lower variability.
    Private Stochastic Optimization With Large Worst-Case Lipschitz Parameter: Optimal Rates for (Non-Smooth) Convex Losses and Extension to Non-Convex Losses. (arXiv:2209.07403v4 [cs.LG] UPDATED)
    We study differentially private (DP) stochastic optimization (SO) with loss functions whose worst-case Lipschitz parameter over all data points may be extremely large. To date, the vast majority of work on DP SO assumes that the loss is uniformly Lipschitz continuous over data (i.e. stochastic gradients are uniformly bounded over all data points). While this assumption is convenient, it often leads to pessimistic excess risk bounds. In many practical problems, the worst-case (uniform) Lipschitz parameter of the loss over all data points may be extremely large due to outliers. In such cases, the error bounds for DP SO, which scale with the worst-case Lipschitz parameter of the loss, are vacuous. To address these limitations, this work provides near-optimal excess risk bounds that do not depend on the uniform Lipschitz parameter of the loss. Building on a recent line of work (Wang et al., 2020; Kamath et al., 2022), we assume that stochastic gradients have bounded $k$-th order moments for some $k \geq 2$. Compared with works on uniformly Lipschitz DP SO, our excess risk scales with the $k$-th moment bound instead of the uniform Lipschitz parameter of the loss, allowing for significantly faster rates in the presence of outliers and/or heavy-tailed data. For convex and strongly convex loss functions, we provide the first asymptotically optimal excess risk bounds (up to a logarithmic factor). In contrast to (Wang et al., 2020; Kamath et al., 2022), our bounds do not require the loss function to be differentiable/smooth. We also devise a linear-time algorithm for smooth losses that has excess risk that is tight in certain practical parameter regimes. Additionally, our work is the first to address non-convex non-uniformly Lipschitz loss functions satisfying the Proximal-PL inequality; this covers some practical machine learning models. Our Proximal-PL algorithm has near-optimal excess risk.
    CRIMED: Lower and Upper Bounds on Regret for Bandits with Unbounded Stochastic Corruption. (arXiv:2309.16563v1 [stat.ML])
    We investigate the regret-minimisation problem in a multi-armed bandit setting with arbitrary corruptions. Similar to the classical setup, the agent receives rewards generated independently from the distribution of the arm chosen at each time. However, these rewards are not directly observed. Instead, with a fixed $\varepsilon\in (0,\frac{1}{2})$, the agent observes a sample from the chosen arm's distribution with probability $1-\varepsilon$, or from an arbitrary corruption distribution with probability $\varepsilon$. Importantly, we impose no assumptions on these corruption distributions, which can be unbounded. In this setting, accommodating potentially unbounded corruptions, we establish a problem-dependent lower bound on regret for a given family of arm distributions. We introduce CRIMED, an asymptotically-optimal algorithm that achieves the exact lower bound on regret for bandits with Gaussian distributions with known variance. Additionally, we provide a finite-sample analysis of CRIMED's regret performance. Notably, CRIMED can effectively handle corruptions with $\varepsilon$ values as high as $\frac{1}{2}$. Furthermore, we develop a tight concentration result for medians in the presence of arbitrary corruptions, even with $\varepsilon$ values up to $\frac{1}{2}$, which may be of independent interest. We also discuss an extension of the algorithm for handling misspecification in Gaussian model.
    Constructing Synthetic Treatment Groups without the Mean Exchangeability Assumption. (arXiv:2309.16409v1 [stat.ML])
    The purpose of this work is to transport the information from multiple randomized controlled trials to the target population where we only have the control group data. Previous works rely critically on the mean exchangeability assumption. However, as pointed out by many current studies, the mean exchangeability assumption might be violated. Motivated by the synthetic control method, we construct a synthetic treatment group for the target population by a weighted mixture of treatment groups of source populations. We estimate the weights by minimizing the conditional maximum mean discrepancy between the weighted control groups of source populations and the target population. We establish the asymptotic normality of the synthetic treatment group estimator based on the sieve semiparametric theory. Our method can serve as a novel complementary approach when the mean exchangeability assumption is violated. Experiments are conducted on synthetic and real-world datasets to demonstrate the effectiveness of our methods.  ( 2 min )
    HyperBO+: Pre-training a universal prior for Bayesian optimization with hierarchical Gaussian processes. (arXiv:2212.10538v2 [cs.LG] UPDATED)
    Bayesian optimization (BO), while proved highly effective for many black-box function optimization tasks, requires practitioners to carefully select priors that well model their functions of interest. Rather than specifying by hand, researchers have investigated transfer learning based methods to automatically learn the priors, e.g. multi-task BO (Swersky et al., 2013), few-shot BO (Wistuba and Grabocka, 2021) and HyperBO (Wang et al., 2022). However, those prior learning methods typically assume that the input domains are the same for all tasks, weakening their ability to use observations on functions with different domains or generalize the learned priors to BO on different search spaces. In this work, we present HyperBO+: a pre-training approach for hierarchical Gaussian processes that enables the same prior to work universally for Bayesian optimization on functions with different domains. We propose a two-step pre-training method and analyze its appealing asymptotic properties and benefits to BO both theoretically and empirically. On real-world hyperparameter tuning tasks that involve multiple search spaces, we demonstrate that HyperBO+ is able to generalize to unseen search spaces and achieves lower regrets than competitive baselines.  ( 2 min )
    Transfer Learning for Bayesian Optimization on Heterogeneous Search Spaces. (arXiv:2309.16597v1 [cs.LG])
    Bayesian optimization (BO) is a popular black-box function optimization method, which makes sequential decisions based on a Bayesian model, typically a Gaussian process (GP), of the function. To ensure the quality of the model, transfer learning approaches have been developed to automatically design GP priors by learning from observations on "training" functions. These training functions are typically required to have the same domain as the "test" function (black-box function to be optimized). In this paper, we introduce MPHD, a model pre-training method on heterogeneous domains, which uses a neural net mapping from domain-specific contexts to specifications of hierarchical GPs. MPHD can be seamlessly integrated with BO to transfer knowledge across heterogeneous search spaces. Our theoretical and empirical results demonstrate the validity of MPHD and its superior performance on challenging black-box function optimization tasks.  ( 2 min )
    Nonparametric estimation of a covariate-adjusted counterfactual treatment regimen response curve. (arXiv:2309.16099v1 [math.ST])
    Flexible estimation of the mean outcome under a treatment regimen (i.e., value function) is the key step toward personalized medicine. We define our target parameter as a conditional value function given a set of baseline covariates which we refer to as a stratum based value function. We focus on semiparametric class of decision rules and propose a sieve based nonparametric covariate adjusted regimen-response curve estimator within that class. Our work contributes in several ways. First, we propose an inverse probability weighted nonparametrically efficient estimator of the smoothed regimen-response curve function. We show that asymptotic linearity is achieved when the nuisance functions are undersmoothed sufficiently. Asymptotic and finite sample criteria for undersmoothing are proposed. Second, using Gaussian process theory, we propose simultaneous confidence intervals for the smoothed regimen-response curve function. Third, we provide consistency and convergence rate for the optimizer of the regimen-response curve estimator; this enables us to estimate an optimal semiparametric rule. The latter is important as the optimizer corresponds with the optimal dynamic treatment regimen. Some finite-sample properties are explored with simulations.  ( 2 min )
    Improving Adaptive Online Learning Using Refined Discretization. (arXiv:2309.16044v1 [cs.LG])
    We study unconstrained Online Linear Optimization with Lipschitz losses. The goal is to simultaneously achieve ($i$) second order gradient adaptivity; and ($ii$) comparator norm adaptivity also known as "parameter freeness" in the literature. Existing regret bounds (Cutkosky and Orabona, 2018; Mhammedi and Koolen, 2020; Jacobsen and Cutkosky, 2022) have the suboptimal $O(\sqrt{V_T\log V_T})$ dependence on the gradient variance $V_T$, while the present work improves it to the optimal rate $O(\sqrt{V_T})$ using a novel continuous-time-inspired algorithm, without any impractical doubling trick. This result can be extended to the setting with unknown Lipschitz constant, eliminating the range ratio problem from prior works (Mhammedi and Koolen, 2020). Concretely, we first show that the aimed simultaneous adaptivity can be achieved fairly easily in a continuous time analogue of the problem, where the environment is modeled by an arbitrary continuous semimartingale. Then, our key innovation is a new discretization argument that preserves such adaptivity in the discrete time adversarial setting. This refines a non-gradient-adaptive discretization argument from (Harvey et al., 2023), both algorithmically and analytically, which could be of independent interest.  ( 2 min )

  • Open

    [R] Drive Like a Human: Rethinking Autonomous Driving with Large Language Models
    Paper - https://arxiv.org/abs/2307.07162 submitted by /u/MysteryInc152 [link] [comments]  ( 9 min )
    [Research] - Resource to query ML and LLM based research
    Made a repo for you all to try using a collaborative AI tool which includes 100+ papers on LLM-Based-Agents. You can try out the repo here: https://www.collama.ai/varun/llm-based-agents submitted by /u/_llama2 [link] [comments]  ( 9 min )
    [D] Choosing the best learning model for a start up app?
    Straight off the bat: I am not very familiar but was tasked to find a suggest a reasonable model for our need. Here is a bit what I read: https://www.obviously.ai/post/how-to-choose-the-right-ai-model-for-your-application https://www.addevice.io/blog/ai-framework-for-app-development The app that I am working on is an education app, and the purpose of the AI would be to (at least in terms of priority) generate a post subject line / topic to discuss. The company is super small, so money is important. JS is being used mainly at the moment. What would be a good choice for a small start up to generate topics for an education app used by schools? At least any ideas or things to consider would be wonderful to get my rabbit hole dive started! Thanks. submitted by /u/Willy988 [link] [comments]  ( 9 min )
    [R] Gsgen: Text-to-3D using Gaussian Splatting
    Project Page Paper Code In this paper, we present Gaussian Splatting based text-to-3D generation (GSGEN), a novel approach for generating high-quality 3D objects. Previous methods suffer from inaccurate geometry and limited fidelity due to the absence of 3D prior and proper representation. We leverage 3D Gaussian Splatting, a recent state-of-the-art representation, to address existing shortcomings by exploiting the explicit nature that enables the incorporation of 3D prior. Specifically, our method adopts a progressive optimization strategy, which includes a geometry optimization stage and an appearance refinement stage. In geometry optimization, a coarse representation is established under a 3D geometry prior along with the ordinary 2D SDS loss, ensuring a sensible and 3D-consistent rough shape. Subsequently, the obtained Gaussians undergo an iterative refinement to enrich details. In this stage, we increase the number of Gaussians by compactness-based densification to enhance continuity and improve fidelity. With these designs, our approach can generate 3D content with delicate details and more accurate geometry. Extensive evaluations demonstrate the effectiveness of our method, especially for capturing high-frequency components. submitted by /u/Sirisian [link] [comments]  ( 9 min )
    [D] Does anyone else feel like MOJO isn't getting the attention it deserves?
    https://docs.modular.com/mojo/ submitted by /u/hai_cben [link] [comments]  ( 9 min )
    [P] Carton – Run any ML model from any programming language
    Hi! I just open-sourced a project that I've been working on for a while and wanted to see what you think! The goal of Carton (https://carton.run) is to let you use a single interface to run any machine learning model from any programming language. It’s currently difficult to integrate models that use different technologies (e.g. TensorRT, Ludwig, TorchScript, JAX, GGML, etc) into your application, especially if you’re not using Python. Even if you learn the details of integrating each of these frameworks, running multiple frameworks in one process can cause hard-to-debug crashes. Ideally, the ML framework a model was developed in should just be an implementation detail. Carton lets you decouple your application from specific ML frameworks so you can focus on the problem you actually want to solve. At a high level, the way Carton works is by running models in their own processes and using an IPC system to communicate back and forth with low overhead. Carton is primarily implemented in Rust, with bindings to other languages. There are lots more details linked in the architecture doc below. Importantly, Carton uses your model’s original underlying framework (e.g. PyTorch) under the hood to actually execute the model. This is meaningful because it makes Carton composable with other technologies. For example, it’s easy to use custom ops, TensorRT, etc without changes. This lets you keep up with cutting-edge advances, but decouples them from your application. I’ve been working on Carton for almost a year now and I open sourced it on Wednesday! Some useful links: Website, docs, quickstart - https://carton.run Explore existing models - https://carton.pub Repo - https://github.com/VivekPanyam/carton Architecture - https://github.com/VivekPanyam/carton/blob/main/ARCHITECTURE.md Please let me know what you think! submitted by /u/vpanyam [link] [comments]  ( 10 min )
    [P] Location Computation
    Hi Everyone, I’m doing a project where I’m crowdsourcing a lot of location data for a set of location labels and then trying to cluster it for each and using the centroid of the cluster as the most accurate location for that location label. The data keeps coming in everyday. I’m not sure when to stop computation. Initially I thought I’ll check the delta between each days centroid computed and if the delta falls under a threshold then stop computing. But now I’m thinking if my daily data collected gets marked as outliers, subsequent days centroids won’t have much of a delta and it will pass my convergence condition. Any suggestions? submitted by /u/Longjumping-Song4958 [link] [comments]  ( 9 min )
    [D][R] Deploying deep models on memory constrained devices
    Suppose we want to use a deep learning model on a gpu within our app. We want this model to coexist on the gpu with other processes, effectively limit it's possible usage of resources. As cuDNN/cuBLAS routines are nondeterministic and possibly dynamically allocate variable amount of memory, how do people manage this problem? Is it a problem at all? Estimating memory usage of deep learning models on gpu is notoriously hard. There is a research paper from Microsoft tackling this problem and they mispredict the usage of memory by 15% on average. Some cpu BLAS libraries like openBLAS or MKL also dynamically allocate the memory, but there are alternatives - LAPACK as far as I know uses only the memory provided by the caller, making it viable option for applications in embedded. In safety crit…  ( 10 min )
    [D] Best Sequence Embedding Models?
    Which are currently the best Sentence Embedding pre-trained models out there? submitted by /u/Uilxitora [link] [comments]  ( 9 min )
    [D] Using Gamification to demystify the AI black-box
    Blog about AI "black box" nature and how it can be explained and become engaging to users using gamification. Explained with example from open-appsec an open-source machine learning-based Web Application & API Security product. https://www.openappsec.io/post/using-gamification-to-demystify-the-ai-black-box-in-a-waf-product https://github.com/openappsec/openappsec submitted by /u/onirisapp [link] [comments]  ( 9 min )
    [Project] Startup Job Post/Contractor role
    Hey all! I'm in the throws of doing a startup and looking for someone to help build a legal tech platform. I can discuss more in person, but it is intended to be a human/lawyer in the loop workflow tool for complex contract and deal analysis. Base product is built and deployed. I'm a former developer turned lawyer with 15 years corporate experiences, and need help/talent/co-founder to help take things to the next level. Ideally you have a mixture of NLP and regular software engineering background and just a very practical approach. If you've played with LLM's all the better. Options for cash, equity, larger roles are all on the table. Just looking for the right talent. DM me if you are interested and lets talk about experience, etc.! And it seems that tags are turned off in here, so not sure how to tag something as [Project] but I put it in the title. submitted by /u/pudgyplacater [link] [comments]  ( 9 min )
    [R] RealFill: Reference-Driven Generation for Authentic Image Completion
    Project page: https://realfill.github.io/ Paper: https://arxiv.org/abs/2309.16668 RealFill is able to complete the image with what should have been there. Abstract Recent advances in generative imagery have brought forth outpainting and inpainting models that can produce high-quality, plausible image content in unknown regions, but the content these models hallucinate is necessarily inauthentic, since the models lack sufficient context about the true scene. In this work, we propose RealFill, a novel generative approach for image completion that fills in missing regions of an image with the content that should have been there. RealFill is a generative inpainting model that is personalized using only a few reference images of a scene. These reference images do not have to be aligned with the target image, and can be taken with drastically varying viewpoints, lighting conditions, camera apertures, or image styles. Once personalized, RealFill is able to complete a target image with visually compelling contents that are faithful to the original scene. We evaluate RealFill on a new image completion benchmark that covers a set of diverse and challenging scenarios, and find that it outperforms existing approaches by a large margin. ​ submitted by /u/StrawberryNumberNine [link] [comments]  ( 9 min )
    [R] Listen2Scene: Interactive material-aware binaural sound propagation for reconstructed 3D scenes
    https://www.youtube.com/watch?v=aNJWCwG-H_U submitted by /u/Snoo63916 [link] [comments]  ( 9 min )
    [R] M3-AUDIODEC: Multi-channel multi-speaker multi-spatial audio codec
    Paper : https://arxiv.org/abs/2309.07416 Demo : https://anton-jeran.github.io/MAD/ Code : https://github.com/anton-jeran/MULTI-AUDIODEC submitted by /u/Snoo63916 [link] [comments]  ( 9 min )
    [R] The Future of Romance: Novel Techniques for Replacing your Boyfriend with Generative AI (parody)
    submitted by /u/TobyWasBestSpiderMan [link] [comments]  ( 9 min )
    Classical NLP course [D]
    Classical NLP course recommendation Can you recommend me NLP course that dives into classical NLP methods: For example: HMM MaxEnt CKY algo Sytactic parsing Dependency Parsing submitted by /u/Thick-brain-dude [link] [comments]  ( 9 min )
    [D] Multi-task learning leads to overfitting. Is this the double descent phenomenon?
    I have a CNN model, call it model M. It was trained on dataset A for object pose estimation. After training for 100 epochs, it resulted in these losses: Train: 0.06 Val: 0.08 As dataset A is somewhat limited, I wonder if I can incorporate additional data via a different, but related task: object segmentation for similar objects. Model M is a UNet, so I can incorporate this task simply with an additional output channel in the last layer. I add dataset B for object segmentation. During training, M learns on both datasets quite well, which suggests to me that the tasks are well-aligned. After 100 epochs, I get these losses on dataset A: Train: 0.06 Val: 0.16 This is surprising to me. If I get the same training loss on dataset A, while training on additional data. I'd expect the validation loss to be lower, since I'm training on 2x the data. Yet the validation performance is consistently higher when I train on both datasets. The only explanation I can think of is the double descent phenomenon. Perhaps when I trained only on dataset A, I was significantly over-parameterized, but past the interpolation threshold. So perhaps adding more data brought me closer to the interpolation threshold, leading to worse generalization. Does this explanation seem likely? Has anyone had similar experiences? submitted by /u/murrdpirate [link] [comments]  ( 9 min )
    [D] What's the relationship between Denoising Autoencoders and Diffusion Models?
    Hello, denoising autoencoders is when you train something to reverse x+n -> x. This seems to be basically the same as a diffusion model, more so if you see the U-Net diffusion model, which is effectively an information bottleneck. submitted by /u/windoze [link] [comments]  ( 9 min )
    [D] How is this sub not going ballistic over the recent GPT-4 Vision release?
    For a quick disclaimer, I know people on here think the sub is being flooded by people who arent ml engineers/researchers. I have worked at two FAANGS on ml research teams/platforms. My opinion is that GPT-4 Vision/Image processing is out of science fiction. I fed chatgpt an image of a complex sql data base schema, and it converted it to code, then optimized the schema. It understood the arrows pointing between table boxes on the image as relations, and even understand many to one/many to many. I took a picture of random writing on a page, and it did OCR better than has ever been possible. I was able to ask questions that required OCR and a geometrical understanding of the page layout. Where is the hype on here? This is an astounding human breakthrough. I cannot believe how much ML is now obsolete as a result. I cannot believe how many computer science breakthroughs have occurred with this simple model update. Where is the uproar on this sub? Why am I not seeing 500 comments on posts about what you can do with this now? Why are there even post submissions about anything else? submitted by /u/corporate_autist [link] [comments]  ( 9 min )
    [P] vLLM with Mistral 7B guide
    Hey all - vllm==0.2.0 got released a couple of hours ago and I put together some code to get it running with the new Mistral 7B model. Also included are some benchmarks for different input batch sizes with the model (output capped at 200 tokens): Batch size Tokens /s 1 46 10 400 60 1.8k Hope it's useful, let me know if you'd like any more info! Here's the link: https://docs.mystic.ai/docs/mistral-ai-7b-vllm-fast-inference-guide submitted by /u/paulcjh [link] [comments]  ( 9 min )
  • Open

    Bing AI chat messages are being hijacked by ads pushing malware
    Bing AI chat messages are being hijacked by ads pushing malware. Malvertising has made its way to Bing's chatbot/search engine. Cybersecurity researchers observed a malicious ad being offered as part of the Chat-GPT, AI-powered answer to a search query. Malvertising is a practice where hackers trick ad networks into displaying ads that look legitimate but are actually malicious. Microsoft integrated Chat-GPT into Bing earlier this year and started monetizing it. When a user types in a query, they would get a result paired with sponsored links. In this instance, researchers were given a link that redirected them to a malicious site. Threat actors continue to leverage search ads to redirect users to malicious sites hosting malware. Bing Chat serves some of the same ads seen via a traditional Bing query. Source : https://www.techradar.com/pro/security/bing-ai-chat-messages-are-being-hijacked-by-ads-pushing-malware submitted by /u/NuseAI [link] [comments]  ( 9 min )
    Crafting Virtual Worlds With Just Words. How AI Changes 3D World Building Forever.
    submitted by /u/Magic-Fabric [link] [comments]  ( 9 min )
    AI — weekly megathread!
    News provided by aibrews.com Meta AI presents Emu, a quality-tuned latent diffusion model for generating highly aesthetic images. Emu significantly outperforms SDXLv1.0 on visual appeal [Paper]. Meta AI researchers present a series of long-context LLMs with context windows of up to 32,768 tokens. LLAMA 2 70B variant surpasses gpt-3.5-turbo-16k’s overall performance on a suite of long-context tasks [Paper]. Abacus AI released a larger 70B version of Giraffe. Giraffe is a family of models that are finetuned from base Llama 2 and have a larger context length of 32K tokens [Details]. Meta announced [Details]: Meta AI - a new AI assistant users can interact with on WhatsApp, Messenger and Instagram. Will also be available on Ray-Ban Meta smart glasses and Quest 3, Meta’s mixed reality h…  ( 12 min )
    I Asked ChatGPT to be my Girlfriend: And it said Yes!
    This is a quick fun project, nothing serious at all, a personalized custom instruction for ChatGPT to make it conversational and maintain character throughout the interaction. If you want to learn more serious and useful prompt engineering techniques head on: r/PromptWizards. In all seriousness, even though this is just for fun, such applied prompt engineering for NPC in games, or online AI companion services can actually be relevant and useful in the future. By initializing this Girlfriend RolePlaying ChatGPT mode, you're not only interacting with an AI but with Sarah, 25, who is keen to explore several layers of a relationship with you. Each command you use brings you a different shade of companionship: /start - Let Sarah introduce herself. /chat - Engages you in a comforting and c…  ( 11 min )
    Exploring Jimmy Apples Claim: "The AGI has been achieved internally" - Detailed Reddit Investigation
    submitted by /u/stefanbg92 [link] [comments]  ( 9 min )
    This week's AI digest
    Planet friendly: Researchers say AI emits up to 1,500 times less CO2 than humans when producing a page of text. Some disagree. Banking: Currently, around 41% of US bank customers are digital only Meta is launching AI chatbots across its apps to retain younger users. Amazon invests $4 Billion in OpenAI competitor Anthropic. Emerging tiger: Nvidia’s CEO bets India will emerge as a major AI market. Regulation: OpenAI CEO Sam Altman advocates for AI regulation despite risks. Suspense: Elon Musk says AI image generation app Midjourney will be releasing “something significant” soon. submitted by /u/unbalanced_mind [link] [comments]  ( 9 min )
    CGPT-4, it is an indisputable fact that Trump waited three hours to end the insurrection violence. What do his supporters guess was his thinking during this time?
    Some posit he was genuinely surprised by the violence and needed time to assess the situation before acting. They argue that the chaotic nature of the events made immediate action complex, given the layers of command and decision-making involved. Others claim he might have been in deliberation with advisors to gauge the scale and implications of intervention, debating the potential backlash from his supporters or the legal ramifications. Another perspective suggests that he might have been contemplating how the events would affect the certification of the Electoral College results, given that his previous legal and political efforts to contest the 2020 election outcome had failed. In this view, he might have been waiting to see if the Congress would be influenced to halt or delay the certification. While some of his supporters may find these explanations plausible, critics argue that the delay represents a dereliction of duty or even tacit support for the violence. submitted by /u/Georgeo57 [link] [comments]  ( 9 min )
    How to Connect ChatGPT to the Internet (Step-by-Step Guide)
    submitted by /u/Senior_tasteey [link] [comments]  ( 9 min )
    Any "free" ai to turn text to speech?
    I am looking for an ai that will turn the text to speech and be free. submitted by /u/Korti213 [link] [comments]  ( 9 min )
    Looking for some help on a project
    Hey y’all, I’ve been seeing these clips everywhere of AI streamers, and I’ve been searching Everywhere for explanations of how to make one. I believe I understand the concepts, but I’m really at a loss for the avatar text to speech part. I believe I have it ready for collecting questions and getting it to ChatGPT for response/script, but im very stuck at using a photo for an avatar that can mouth the words and not take 3 mins per response. Any help is appreciated, I’ve been at this project for longer than I’d like lmao. The attached video is a random YouTube short for reference submitted by /u/Lipoz69 [link] [comments]  ( 9 min )
    He got Facebook hooked on AI. Now he can't fix its misinformation addiction
    Facebook's addiction to spreading misinformation and hate speech is a result of its AI algorithms. Joaquin Quiñonero Candela, a director of AI at Facebook, was tasked with fixing the problem but was only focused on addressing AI bias. The Responsible AI team failed to make headway against misinformation and hate speech because it never made those problems its main focus. The spread of lies and hate speech on Facebook has only grown, contributing to genocidal campaigns and the promotion of dangerous falsehoods. The algorithms that underpin Facebook's business were designed to maximize engagement, not filter out false or inflammatory content. Source : https://www.technologyreview.com/2021/03/11/1020600/facebook-responsible-ai-misinformation/ submitted by /u/NuseAI [link] [comments]  ( 9 min )
    Album covers but Morgan Freeman
    submitted by /u/TheGhettoControversy [link] [comments]  ( 9 min )
    Google is expanding its AI-powered search experience to teenagers
    Google's AI-driven search experience, Search Generative Experience (SGE), is now accessible to teenagers between 13-17 in America. Entailments include a conversational mode for searches, which Google believes can help youngsters pose atypical questions to dig deeper. For the latest advancements in AI, look here first. Teen-friendly AI search SGE introduces a conversational mode to Google Search, allowing users to ask questions and follow-ups in a more natural language. To prevent harmful content from surfacing, Google has placed guardrails, providing stronger protections related to illegal and age-gated substances, or bullying. Features and improving AI accuracy Google is rolling out "About this result" to provide users with more context about the displayed content. Google acknowledges and addresses any validation of false or offensive claims by the AI-powered response, ensuring to provide higher quality and more accurate responses. It’s also using large language models to self-critique and rewrite draft responses on sensitive topics based on quality and safety principles. SGE's popularity and future plans Since SGE's introduction, it has found popularity, especially among younger users who prefer a conversational approach. Google plans to expand SGE outside the U.S. to India and Japan and improve its services with support for videos, images, local info, and more. It's also experimenting with ads positioned next to the AI-generated responses. (source) P.S. If you like this kind of analysis, I write a free newsletter that tracks the most relevant news and research in AI and tech. Professionals from Google, Meta, and OpenAI are already reading it. submitted by /u/AIsupercharged [link] [comments]  ( 9 min )
  • Open

    Build a crop segmentation machine learning model with Planet data and Amazon SageMaker geospatial capabilities
    In this analysis, we use a K-nearest neighbors (KNN) model to conduct crop segmentation, and we compare these results with ground truth imagery on an agricultural region. Our results reveal that the classification from the KNN model is more accurately representative of the state of the current crop field in 2017 than the ground truth classification data from 2015. These results are a testament to the power of Planet’s high-cadence geospatial imagery. Agricultural fields change often, sometimes multiple times a season, and having high-frequency satellite imagery available to observe and analyze this land can provide immense value to our understanding of agricultural land and quickly-changing environments.  ( 15 min )
  • Open

    Innovative Endeavors: Meta Introduces AI-Powered Tools and Smart Glasses
    submitted by /u/Allinhalf [link] [comments]  ( 9 min )
    Pruning a specific dimension in a neural network using L1-norm
    I've been playing around with pruning neural networks. One interesting thing I've found is that pruning the weights with the lowest L1-norm along a specific dimension seems to give better results than simply pruning all of the weights with the lowest L1-norm (which I believe is the standard method; for example this is what torch.nn.utils.prune.l1_unstructured does). Does anyone have an explanation for why this might be, or knows of any research in this area? I'm aware that structured pruning removes entire channels in a specific dimension. But I'm referring to unstructured pruning here, where I remove a subset of the weights along a specific dimension. Admittedly I've only done very limited benchmarking of this. See this repo for my implementation, and some benchmark details. submitted by /u/Neilf79 [link] [comments]  ( 9 min )
    Help understanding ai, Specificaly cnn cause i want to try training a model on mnist data set as my first project
    Hello, so i learnt the very basics of ai and im trying to understand how nn works, this is what i have figured out so far. so if i have a 4x4 image e.g 0 1 1 0 1 0 0 1 1 1 1 1 1 0 0 1 i pass it across a 2x2 kernal e.g 1 1 0 3 ​ and padding it would do ​ dot product of 0 1 1 0 ​ x ​ 1 1 0 3 ​ is 1 ​ ​ and if we do that to all of them we get a new matrix ​ 1 2 4 4 1 3 5 4 4 ​ ​ then we have padding same so this becomes ​ 0 0 0 0 1 2 4 0 4 1 3 0 5 4 4 0 ​ ​ we then turn it into a feature map, basically flatenting it to something like this 0,0,0,0,1,2,4,0,4,1,3,0,5,4,4,0 ​ so the input has 16 features, if we have a layer of 3 nerons that fire with relu activation funciton and each weight is alternating between 1 and 2 for simplicity sake . we would do 0*1 + 0*2 + 0*1 .... 4*2 + 0*1 = 32 so if we are using relu, we would do is 32 > 0? if so we pass 32 to next neuron if not we pass 0? ​ idk the rest, i guess i forgot what uni taught me 😅 ​ hers a diagram i drew, maybe you can help me figure out hte rest, im confused how the output layer works i guess ​ ​ ​ ​ submitted by /u/SaadPaad2003 [link] [comments]  ( 9 min )
    help understanding basics of neural networks, cnn's to be exact
    Hello, so i learnt the very basics of ai and im trying to understand how nn works, this is what i have figured out so far. so if i have a 4x4 image e.g 0 1 1 0 1 0 0 1 1 1 1 1 1 0 0 1 i pass it across a 2x2 kernal e.g 1 1 0 3 ​ and padding it would do ​ dot product of 0 1 1 0 ​ x ​ 1 1 0 3 ​ is 1 ​ ​ and if we do that to all of them we get a new matrix ​ 1 2 4 4 1 3 5 4 4 ​ ​ then we have padding same so this becomes ​ 0 0 0 0 1 2 4 0 4 1 3 0 5 4 4 0 ​ ​ we then turn it into a feature map, basically flatenting it to something like this 0,0,0,0,1,2,4,0,4,1,3,0,5,4,4,0 ​ so the input has 16 features, if we have a layer of 3 nerons that fire with relu activation funciton and each weight is alternating between 1 and 2 for simplicity sake . we would do 0*1 + 0*2 + 0*1 .... 4*2 + 0*1 = 32 so if we are using relu, we would do is 32 > 0? if so we pass 32 to next neuron if not we pass 0? ​ idk the rest, i guess i forgot what uni taught me 😅 ​ hers a diagram i drew, maybe you can help me figure out hte rest, im confused how the output layer works i guess ​ ​ https://preview.redd.it/h07o5y6847rb1.png?width=1859&format=png&auto=webp&s=df1cdf73ea64ff93ac872dfe8248722e8befd31d ​ ​ submitted by /u/WranglerParty5452 [link] [comments]  ( 9 min )
    Adapt GAN
    Hi everyone, Im new to the Neural network and I wanted some advice : I wanted to generate grayscale images with certain properties : - distribution of pixels values, space correlation between pixels, etc... I already know the type of result that I need, but I wanted to know if a neural network especially a GAN was capable to produce images fitting me requirements. I was thinking that maybe I could change the GAN architecture such as : 1)the Real data inputs (normally images feed to discriminator) will simply be the statistical parameters that I am expecting. 2) I'll add a measure of the various statistical parameters on all the synthetic images generated. 3)Finally the discriminator will only based itself on the statistical parameters comparison for weights updates. Does such network make sense ? If so I have trouble finding a way of implementing it but that is an other story. Right know I want to know if this is doable ? If not do you have any alternative suggestion for my issue ? Thanks all for your advice ! submitted by /u/Hectorite [link] [comments]  ( 9 min )
    Why Batch Norm Works
    submitted by /u/Personal-Trainer-541 [link] [comments]  ( 9 min )
  • Open

    Meta's Technological Marvel: AI-Powered Tools and Intuitive Smart Glasses
    submitted by /u/Allinhalf [link] [comments]  ( 9 min )
    Why is dyna Q not outperforming Q learning in terms of sample efficiency?
    I coded a dyna Q implementation based on the algorithm given in Sutton's book over here. However, it seems like both are equally sample efficient on the cliff walking environment. Here is my code. These are my results - ​ ​ https://preview.redd.it/z7xwow5hz7rb1.png?width=585&format=png&auto=webp&s=90b33eb4c754e199e9bf15499a78e0f42e05f5d2 The only think that came to my mind was to increase the model sampling rate (`n_iters`). Even after assigning a large value to it, the performance doesn't change. submitted by /u/Academic-Rent7800 [link] [comments]  ( 9 min )
    How can I config and build MJPC c++ software?
    I'm trying to install and run this open-source project https://github.com/google-deepmind/mujoco_mpc. It's called MJPC, and it's a C++ software that displays a real-time interactive interface. I've cloned the code, installed CMake, and gcc version 13.1.0 to run C++20. I've also installed the CMake Tools and C/C++ extensions in VSCode as instructed. However, I'm not sure what to do next. I have no experience with C++ and software coding, configuring in VSCode, or building it. Please help me if you can, provide detailed guidance. submitted by /u/Nghiattk27 [link] [comments]  ( 9 min )
    LLM Agents for RL envs
    Has anyone here tried using LLM Agents to solve RL environments? I'm curious about your experiences. Considering that performing a single action involves a chain of thoughts, how fast did your experiments go? Please feel free to add any additional comments about this. Cheers! submitted by /u/stinoco [link] [comments]  ( 9 min )
    Shape Formation with Multi-Agent Reinforcement Learning
    Hey everyone, I'm trying to write MARL code with MAPPO policy to train three agents to form a triangle shape. I'm relatively new to RL, having completed the fundamentals, but I'm struggling to come up with suitable resources which can teach me how to implement codes on python. I'd be really greatful if someone could share some insights or useful resources where I can learn to code and implement MARL. submitted by /u/The_One263 [link] [comments]  ( 9 min )
    Shape Formation with Multi-Agent Reinforcement Learning
    Hey everyone, I'm trying to write MARL code with MAPPO policy to train three agents to form a triangle shape. I'm relatively new to RL, having completed the fundamentals, but I'm struggling to come up with suitable resources which can teach me how to implement codes on python. I'd be really greatful if someone could share some insights or useful resources where I can learn to code and implement MARL. submitted by /u/The_One263 [link] [comments]  ( 9 min )
    Curiosity/ Exploration with Rllib
    Hi! I’ve been training a MultiAgentEnv with Curiosity, but I’d like to extend my action space to be a Dictionary. Are there any similar modules I could use instead or is there any way to use Curiosity with a Dictionary consisting of a Box and a Discrete action space. Thank you! submitted by /u/tessherelurkingnow [link] [comments]  ( 9 min )
  • Open

    Regular solids and Monte Carlo integration
    Monte Carlo integration is not as simple in practice as it is often introduced. A homework problem might as you to integrate a function of two variables by selecting random points from a cube and counting how many of the points fall below the graph of the function. This would indeed give you an estimate […] Regular solids and Monte Carlo integration first appeared on John D. Cook.  ( 6 min )
  • Open

    Who will benefit from AI?
    In campus talk, Daron Acemoglu offers vision of “machine usefulness,” rather than autonomous “intelligence,” to help workers and spread prosperity.  ( 11 min )
  • Open

    Heeding Huang’s Law: Video Shows How Engineers Keep the Speedups Coming
    In a talk, now available online, NVIDIA Chief Scientist Bill Dally describes a tectonic shift in how computer performance gets delivered in a post-Moore’s law era. Each new processor requires ingenuity and effort inventing and validating fresh ingredients, he said in a recent keynote address at Hot Chips, an annual gathering of chip and systems Read article >  ( 6 min )

  • Open

    Cross-platform way to enter Unicode characters
    The previous post describes the hoops I jumped through to enter Unicode characters on a Mac. Here’s a script to run from the command line that will copy Unicode characters to the system clipboard. It runs anywhere the Python module pyperclip runs. #!/usr/bin/env python3 import sys import pyperclip cp = sys.argv[1] ch = eval(f"chr(0x{cp})") print(ch) […] Cross-platform way to enter Unicode characters first appeared on John D. Cook.  ( 5 min )
    Using Unicode on MacOS
    Setting up Unicode on my MacBook took some research, so I’m leaving myself a note here if I need to do it again. Maybe it’ll help someone else too. From the System Settings dialog, go to Keyboard and click the Edit button next to Input Sources. Click on the + sign in the lower left […] Using Unicode on MacOS first appeared on John D. Cook.  ( 5 min )
  • Open

    The Creator (2023) movie discussion
    In theaters now. PG-13. Synopsis from Fandango (mild spoilers) From writer/director Gareth Edwards (“Rogue One,” “Godzilla”) comes an epic sci-fi action thriller set amidst a future war between the human race and the forces of artificial intelligence. Joshua (John David Washington, "Tenet"), a hardened ex-special forces agent grieving the disappearance of his wife (Gemma Chan, "Eternals"), is recruited to hunt down and kill the Creator, the elusive architect of advanced AI who has developed a mysterious weapon with the power to end the war… and mankind itself. Joshua and his team of elite operatives journey across enemy lines, into the dark heart of AI-occupied territory… only to discover the world-ending weapon he’s been instructed to destroy is an AI in the form of a young child (newcomer Madeleine Yuna Voyles). Trailer If there is any other media I should make threads for just let me know- could be video games, television, etc. submitted by /u/jaketocake [link] [comments]  ( 9 min )
    Aryn comes out of stealth to bring GenAI to OpenSearch and data preparation
    Aryn, a team with experience in AWS big data and database services, has come out of stealth and raised $7.5M in series seed funding. Their mission is to bring generative AI to OpenSearch and data preparation. They aim to use generative AI models to process unstructured data for tasks such as information extraction, question-answering, summarization, and content generation. Aryn's conversational search approach empowers users to interact with their unstructured enterprise data. They have developed a conversational search stack consisting of a semantic data preparation system called Sycamore, semantic search with OpenSearch, and conversational capabilities in OpenSearch. Generative AI powers each component of the stack, leading to higher quality answers and ease of use. Developers can quickly build and deploy applications like question-answering, chatbots, and research platforms using Aryn's stack without needing expertise in AI and search. Aryn's stack is 100% open source, making it accessible to developers. Source : https://blog.aryn.ai/aryn-bringing-generative-ai-to-opensearch-and-data-preparation submitted by /u/NuseAI [link] [comments]  ( 9 min )
    Why does this read like someone used chatdev and gave it a marketing agent named clover with access to a reddit account?
    submitted by /u/Lesbianseagullman [link] [comments]  ( 9 min )
    Meta Unfolds a 'Universe of AI' Across Instagram, Facebook, and WhatsApp
    Meta has unveiled colossal AI updates peppered across its platform that would fundamentally alter user experiences on Instagram, Facebook, and WhatsApp, opening up a "universe of AI" solutions. For the latest advancements in AI, look here first. https://preview.redd.it/bl81rlbqp1rb1.png?width=2048&format=png&auto=webp&s=be44b8ebae8f65b53eb82fe2a78b45f19260c452 Spearheading the AI Universe - Meta AI Chatbot The “advanced conversational assistant” is set to enhance Messenger, WhatsApp, and Instagram services and will be incorporated into upcoming Ray-Ban Meta smart glasses and Quest 3. Real-time information capabilities have been bolstered through a partnership with Microsoft Bing, and image generation is powered by a new model, Emu. A Galaxy of AI Personalities Meta rolled out 28 AIs in beta, featuring sterling personas such as Snoop Dogg, Tom Brady, Kendall Jenner, and Naomi Osaka, thus amplifying the interactivity quotient. AI Studio - Empowering Businesses The AI Studio Platform is equipped to enable businesses to build AI chatbots for messaging services on Facebook, Instagram, and Messenger. Also, Meta will provide a sandbox tool in the upcoming year for users to experiment with creating their own AI. Generative AI Stickers - A New Co-creating Experience AI editing tools will allow users to edit images and co-create content with friends. The tool uses Llama 2 and the new image generation model, Emu, to convert text prompts into stickers in seconds. Ray-Ban Smart Glasses with Meta AI The Ray-Ban smart glasses are equipped with Meta AI, allowing users to receive information, incite creativity, and manage the glasses using just their voice. (source) P.S. If you like this kind of analysis, I write a free newsletter with the latest and most impactful news in AI. Professionals from Google, Meta, and OpenAI read it daily. submitted by /u/AIsupercharged [link] [comments]  ( 9 min )
    Get a job as a Prompt Engineer - Challenge: generate SAT-Style Multiple Choice Questions.
    One member on r/PromptWizards just applied for a job as a Prompt Engineer in a company, and they tasked him to craft a prompt system that generates high-quality SAT-style multiple-choice questions. Quite a quest, right? Well, stick around, and we'll take a deep dive into the prompt engineering we used to help him. The mission was precise: Write a prompt to yield an SAT-style multiple-choice question that rigorously tests a student's understanding of "Algebraically solving systems of 2 linear equations in 2 variables". The challenge didn't end there; the question produced had to meet the hard/difficult mark set by real SAT questions. Using the OpenAI Playground, we conducted incisive iterations, testing each prompt separately to mitigate any bias from prior outputs. Our approach was: - …  ( 11 min )
    Warner on AI regulation: ‘We probably can't solve it all at once’
    submitted by /u/smo279 [link] [comments]  ( 9 min )
    Courses for more Seniors
    Hello all, What course would you recommend for those of us who are older and already settled in other careers. For example I'm 35 and a manager so I wouldn't need a course to actually design AI or anything. It would be more related to understanding how/where to implement it in an organisation. Any tips? Cheers and merci submitted by /u/JYanezez [link] [comments]  ( 9 min )
    Show-1: Marrying Pixel and Latent Diffusion Models for Efficient and High-Quality Text-to-Video Generation
    A new paper proposes Show-1, a hybrid model that combines pixel and latent diffusion for efficient high-quality text-to-video generation. Both of these approaches have tradeoffs, so researchers at the National University of Singapore tried a hybrid approach combining both, and shared the results in a paper published yesterday. My highlights from the paper: Pixel diffusion excels at low-res video generation precisely aligned with text Latent diffusion acts as efficient upsampling expert from low to high res Chaining the two techniques inherits benefits of both Show-1 achieves strong alignment, quality, and 15x less inference memory The key is using pixel diffusion for the initial low-resolution stage. This retains alignment with text descriptions. Latent diffusion then serves as a super-resolution expert, upsampling efficiently while preserving fidelity. By blending complementary techniques, Show-1 moves past tradeoffs limiting the individual models. More details here. Paper is here (includes links to example generations). submitted by /u/Successful-Western27 [link] [comments]  ( 9 min )
    What AI makes images that subtle forms a word like this one?
    submitted by /u/samuraiogc [link] [comments]  ( 9 min )
    Getting emotional with LLMs can increase performance by 115% (Case Study)
    submitted by /u/Senior_tasteey [link] [comments]  ( 9 min )
    Question about a small project
    Me and my sister have a small project we are thinking about working on. The idea is basically that we are going to enter the same prompt, separately, into an image generating a.i of some sort (Dalle2 etc) for a period of time and hopefully see the result change. We would probly pick words or frases that are topical and debated. This only works though if the a.i isnt just trained on old data and has active connection to the internet. MY question is therefor, is there an a.i right now that would fit the task? Sorry if the question is dumb or i didnt explain myself clearly! submitted by /u/Mejwynn [link] [comments]  ( 9 min )
    One-Minute Daily AI News 9/27/2023
    ODIN integrates Large Language Models (LLMs) into Obsidian using LangChain, allowing you to ask questions about the data stored in your knowledge graph right from the prompt bar.[1] ChatGPT users can now browse internet, OpenAI says.[2] Adobe’s Photoshop on the web launch includes its popular desktop AI tools.[3] The White House plans to introduce a highly anticipated executive order in the coming weeks dealing with artificial intelligence, President Joe Biden said Wednesday.[4] Sources: [1] https://github.com/memgraph/odin [2] https://www.reuters.com/technology/openai-says-chatgpt-can-now-browse-internet-2023-09-27/ [3] https://www.theverge.com/2023/9/27/23892889/adobe-photoshop-for-the-web-firefly-ai-generative-fill-full-release-price-date [4] https://www.cnn.com/2023/09/27/tech/joe-biden-executive-order-artificial-intelligence/index.html submitted by /u/Excellent-Target-847 [link] [comments]  ( 9 min )
    Multimodal AI's will cause people to embrace their own reality bubbles and that is bad news for dictatorships
    I have been messing with llama. Trying to make a script to make a movie. Sort of realize it is not there yet, it can write decoherent long stories / what ever you want. You can couple it with stable diffusion to make images that would have to be described better to fit the "movie" or narrative. It is not there yet, ChatGPT can already do this, you can ask it to tell you a story and describe the visual scenes. At the same time, we are getting audio generation from things like audioldm2 and stableaudio etc. Multimodal AI's are almost here. Pretty soon we will have devices in our pockets powered by AI chips that will be able to generate what ever reality we want. We can feed them images from our past and they can allow us to live in VR reality of the past. Or we can choose to live in anot…  ( 10 min )
    Jazz Fusion (AI Generated DnB & Jazz music and video)
    submitted by /u/LibeerCZ [link] [comments]  ( 9 min )
  • Open

    Modern reinforcement learning for video game NPCs
    submitted by /u/akliyen [link] [comments]  ( 9 min )
    Reinforcement learning in automating game testing
    The role of Reinforcement learning in automating game testing is becoming increasingly crucial, making it more efficient and effective. Manual testing, while essential, is extremely time-consuming and subject to human error. Our opensource library SheepRL 🐑 can be used to test whether the game dynamics is well defined: what if a player can finish the game with just a few moves? 🎮 This video shows that our agent (Kasumi, on the left) is able to win the game in the hardest modality by standing down and throwing kicks. 🥊 This can be helpful for a game developer to: ​ understand where and how intervene to achieve a more playful game predict and correct bugs early in the game development process enhance the gaming experience and final product quality reduce time and resources spent on debugging. The game has changed 🔥 and it is up to us to play it with (human + artificial) intelligence! Thanks to u/DIAMBRA_AIArena for the video! --- ❌ Are you interested in joining the project community? Get in touch ❌ SheepRL 🐑 is open-source, fully written in PyTorch and accelerated with LightningFabric - by Lightning AI Feel free to use it for your Artificial Intelligence projects, and if you want to contribute, we are more than happy to accept your pull requests! ❤️ https://reddit.com/link/16uht6v/video/ve3derxsc0rb1/player submitted by /u/Manu_Orobix [link] [comments]  ( 9 min )
    Proofs in the original Q-Learning technical notes
    I'm not sure it's the right place for this, but I was going through the proofs in the "original" 1992 technical notes of Q-learning, and a couple of points raised some questions: 1) In the Proof of lemma B.4: https://preview.redd.it/7g6pputdqwqb1.png?width=1006&format=png&auto=webp&s=fe4afeac3b06deee6c80105b280a0085bdcfbe51 where do P_{xy}^2(a_2) and R_x(a_2) come from? If we apply the definitions of Q'(x, a_1, a_2) and Q(x, a_1, a_2) to get the bound, P_{xy}^2(a_2) and R_x(a_2) should not be there. Are they just notation errors or is it correct and I'm missing something? ​ 2) I don't quite get how the bounds on P and R are computed in Section 3.2: https://preview.redd.it/p06ysjewqwqb1.png?width=962&format=png&auto=webp&s=a5929e701099dc6e4543efe7681f96f12f543fa8 Considering the results in B.4 (i.e., the bounds for the distance between the chain's P, R and the real ones), I don't understand how they arrive at this conclusion. ​ I'd greatly appreciate any intuitions about these, or if someone can point me in the right direction :) submitted by /u/Beautiful_Zebra_198 [link] [comments]  ( 9 min )
  • Open

    [N] We Collaborated with Outerbounds to Enable HPC and Ray Integration in Metaflow
    Here is our blog post - please check it out: https://forums.autodesk.com/t5/engineering-hub-blog/autodesk-and-outerbounds-partner-to-open-source-ray-and-hpc/ba-p/12254816 And try out the metaflow-ray extension here: https://github.com/outerbounds/metaflow-ray submitted by /u/rirhun [link] [comments]  ( 9 min )
    [D] What are the options for the most human TTS?
    So I’ve been using elevenlabs but it burns through characters really fast. What are the best options for the most human sounding TTS available? I’ve been looking into tortoise, but would like to see if there are other options I should be looking into. submitted by /u/Long8D [link] [comments]  ( 9 min )
    [D] How do we know Closed source released benchmarks aren't being heavily optimized, through outside means?
    I've recently started working with ML and NLP, so I'm sorry if this sounds Naive. Unlike Llama 2 or other open source, we don't have access to the model weights for GPT-4, Claude or Bard, so Benchmark Evals are being run through either APIs or the chat Interface. So how do we know that the model isn't being Boosted by custom web-searching abilities or RAG? While GPT-4 might have a turnoff option, I'm pretty sure Bard is always online, being built by google. So how do we trust benchmarks? Also, have any opensource been tested after Websearch/RAG? submitted by /u/vatsadev [link] [comments]  ( 9 min )
    [R] Searching for a regression dataset with structure in its prediction
    I am searching for a relatively simple dataset to train a regressor that has some structure in its predictions. Can't be too tiny cause I have to try out a NN architecture. It must have at least some continuous feature but can also have additional categorical or related discrete structures. I usually work with vision tasks, so I am not sure if I miss something obvious I could try? Open for ideas! I thoughts about predicting rows in some tabular dataset? Anything suitable that comes to mind? submitted by /u/LeanderKu [link] [comments]  ( 9 min )
    [N] CUDA Architect and Cofounder of MLPerf: AMD's ROCM has achieved software parity with CUDA
    Greg Diamos, the CTO of startup Lamini, was an early CUDA architect at NVIDIA and later cofounded MLPerf. He asserts that AMD's ROCM has "achieved software parity" with CUDA for LLMs. Lamini, focused on tuning LLM's for corporate and institutional users, has decided to go all-in with AMD Instict GPU's. https://www.crn.com/news/components-peripherals/llm-startup-embraces-amd-gpus-says-rocm-has-parity-with-nvidia-s-cuda-platform submitted by /u/makmanred [link] [comments]  ( 9 min )
    [P] Request to test Mirage: A platform to search and generate images, videos, audio, and 3D assets using natural language
    Mirage is the infinite asset library that helps you find or create the perfect digital asset. 🗨️ Just Search Naturally: No awkward keywords—Mirage understands you. 🤖 State-of-the-Art Models: Can't find it? Generate it, thanks to open-source models. 🔍 Similarity Search: Discover more of what you love with a single click. 🤗 Fully Personalized: Our AI librarian learns your style to show you assets you'll dig. Website Link: MirageML Open-Source Github: Github Development Status: Beta I would love to get some honest feedback! submitted by /u/perception-eng [link] [comments]  ( 9 min )
    [P] Request to test Domeis: A new platform for no-code Machine Learning
    Domeis is a no-code Machine Learning platform that offers a dashboard to design, train and test Machine Learning algorithms, as well as to import, pre-process and cleanse data, all from the Graphical User Interface and without writing a single line of code. The aim of this platform is two-fold: Making Machine Learning accessible to anyone and not just Data Scientists or experienced software developers. By offering the possibility to design, train and test Machine Learning models directly via GUI, being an experienced software developer is no longer a pre-condition for creating ML models Making Machine Learning model creation, training and testing faster for experienced Data Scientists / Machine Learning Engineers. By drastically reducing the time needed to set up environments, import data and define models, Domeis allows Machine Learning practitioners to focus on trying out and compare different models/approaches. Website Link: https://www.domeis.it/ Development Status: Alpha I would love to get some honest feedback! submitted by /u/Ok_Hold_5385 [link] [comments]  ( 9 min )
    [D] Help understanding convergence proof (Adaptive learning rate + Momentum)
    Hello everyone, I am trying to understand the convergence analysis/derivation of the momentum algorithm, or the stochastic heavy ball algorithm, using the regret bound analysis from different research papers. https://ieeexplore.ieee.org/document/7330562 - Page3 https://www.mdpi.com/2504-3110/6/12/709 - Page6 http://arxiv.org/abs/1707.01647 - Page4 ​ In the derivation, there is the following simplification, which I do not understand at all ​ $\frac{2\boldsymbol{\eta}_{k}}{(1-\beta)}\sum_{k=0}^{T}\left|J(\theta_k) - J(\theta^*)\right| + \frac{2\boldsymbol{\eta}_{k}\beta}{(1-\beta)^2} \sum_{k=0}^{T}\left|J(\theta_k) - J(\theta_{k-1})\right| \leq \ \left|\boldsymbol{\theta}_{0} + \boldsymbol{p}_{0} - \boldsymbol{\theta}^* \right|^2 - \left|\boldsymbol{\theta}_{T+1} + \boldsymbol…  ( 9 min )
    [D]Any researchers or institutions in USA that follows Ai-compression relationships specifically like deepmind
    I have tried to follow the main collaborators of Hutter and other prominent scientists to track this, but they are mostly in Europe with some in Australia. American institutions seems to be more interested in more open ai like deep neural networks. If anyone is familiar with any US based institutions that does notable work in this line,please comment submitted by /u/Netero1999 [link] [comments]  ( 9 min )
    [R] Brain Tumor segmentation
    Can any of you suggest me computer science research ideas related to brain tumor segmentation using UNet. submitted by /u/Eleonora467 [link] [comments]  ( 9 min )
    [P] BionicGPT - ChatGPT replacement that let's you run R.A.G on confidential data
    BionicGPT is an open source WebUI that gives enterprises the ability to run Retrieval Augmented Generation (RAG) on their on premise documents. To allow people to get up to speed we deploy with a quantized 7B model that runs on CPU. Github Repo: https://github.com/purton-tech/bionicgpt We basically implement a RAG pipeline including document upload, embeddings generation and subsequent retrieval. Feedback: We'd love to get some feedback in the form or github issues or comments here. Screenshot: https://preview.redd.it/uiw0wqul30rb1.png?width=2447&format=png&auto=webp&s=8ad7e61ed048258c19aa63bf7c94d12da5b721fa submitted by /u/purton_i [link] [comments]  ( 9 min )
    [N] First Impressions with GPT-4V(ision)
    My colleague Piotr and I have been testing GPT-4V(ision) over the last day. We wrote up our findings, covering how GPT-4V performs on: Visual question answering (VQA) across a range of domains (locations, movies, plants) OCR Math OCR Object detection And more TL;DR: GPT-4V performed well for VQA and document OCR but struggled with OCR on real-world images and object detection (where we asked for bounding boxes). https://blog.roboflow.com/gpt-4-vision/ I would love to hear what other people have found working with GPT-4V. submitted by /u/zerojames_ [link] [comments]  ( 9 min )
    [R] NUS: Results of Combining Pixel and Latent Diffusion Models for Text-to-Video Generation
    A new paper proposes Show-1, a hybrid model that combines pixel and latent diffusion for efficient high-quality text-to-video generation. Both of these approaches have tradeoffs, so researchers at the National University of Singapore tried a hybrid approach combining both, and shared the results in a paper published yesterday. My highlights from the paper: Pixel diffusion excels at low-res video generation precisely aligned with text Latent diffusion acts as efficient upsampling expert from low to high res Chaining the two techniques inherits benefits of both Show-1 achieves strong alignment, quality, and 15x less inference memory The key is using pixel diffusion for the initial low-resolution stage. This retains alignment with text descriptions. Latent diffusion then serves as a super-resolution expert, upsampling efficiently while preserving fidelity. By blending complementary techniques, Show-1 moves past tradeoffs limiting the individual models. More details here. Paper is here (includes links to example generations). submitted by /u/Successful-Western27 [link] [comments]  ( 9 min )
    Linear Regression Queries [D]
    I am a beginner in Data Science. I have recently enrolled in the supervised machine learning algorithm by Andrew Ng in Coursera. I am now familiarized with linear regression, gradient descent. However, I faced a certain issue. In the optional lab, there was a task to calculate the value of the cost function using gradient descent for linear regression. I wrote the code in my notebook by myself and cross checked it to be correct. However, the desired output of w,b are very much different but the cost function yields a better result in my code. Another factor, I noticed that have to scale only thex variables, leaving the values of y. I have two major queries now: Is the yielding of different w,b values fine as long as the cost function is minimum? (w is a numpy array) Why do scale the x variables only? Why don't scale the y variables? Thanks in advance. submitted by /u/healing_you [link] [comments]  ( 9 min )
    [P] Hands-on open-source workflows for voice AI
    Hey r/MachineLearning, we made a tutorial that showcases typical workflows and tooling for voice analytics applications. The tutorial is intended for intermediate-level ML practitioners. The walkthrough is purely based on open source software and covers: Efficient interactive data exploration and inspection Dataset handling and inference on pre-trained models Model debugging and identification of critical data clusters Model comparison and selection ​ https://i.redd.it/j15gk3kkgyqb1.gif 🔗 Blog with code: https://medium.com/p/dbfd923a5a79#432e-3559ae606f80 🤗 Interactive demo: https://huggingface.co/spaces/renumics/emodb-model-debugging ​ ​ submitted by /u/44sps [link] [comments]  ( 9 min )
    [D] CV annotations and work with COCO/YOLO dataset
    Hi everyone. In my job I work with a lot of data for Computer Vision, and I use Label Studio for annotations. But the last time I've worked with it, I lost some of my annotations, which I need for other purposes. I have the final result as a YOLO and COCO dataset, but I cannot import the results from them to recover all I need. Can you suggest me good applications with an intuitive UI to import the COCO or YOLO dataset and work with labels? submitted by /u/thattallsoldier [link] [comments]  ( 9 min )
    [P] Request to Test PyMilo: A New Python Library for Machine Learning I/O
    Pymilo is an open-source Python package that offers an efficient, safe, and transparent method for transporting pre-trained machine-learning models. The motivation for developing this package is to eliminate the risks of binary or pickle formats. As this library is still in its early stages of development, it currently supports only a limited number of machine learning models provided by Scikit-learn. Nevertheless, it will be precious if the community utilizes this library and provides us with their feedback about improving the package's interface and prioritizing future developments. Your cooperation would be invaluable to us. In the following, I provide an example of how to utilize PyMilo. GitHub Repo: https://github.com/openscilab/pymilo Development Status: Alpha Simple Linear Mode…  ( 9 min )
    [Discussion] Interesting interview question
    Was asked something similar to the following question in an interview for a ML role and was curious how others would answer this: Say you have a dataset with one feature column and one label column (with different classes). Assume this data is too large to fit into memory and could be infinite in size (e.g data is coming in as a stream). How would you train a ML model on this data to accurately predict the label? Followup: instead of one feature column, what if you had several thousand? How would you decide which features to use given the size of the dataset? I discussed online sampling (resevoir sampling, etc) as a way to get a training dataset that could fit in memory + continually train on that but the interviewer did not seem convinced. Any thoughts? submitted by /u/scpdstudent [link] [comments]  ( 9 min )
    [D] What appropriate loss function to use for "Search recall" optimization?
    I'm studying the application of ML to improve searches. Here's a couple of example scenarios: Document retrieval (search) system: We have a (source) document with us and we're trying to find a matching document in a database. The source document has text and image attributes - for simplicity let's say a title and a single image. Each search result will also be a document - with a title and at most one image. A search engine: We have a query comprised of both text and an image (like google image search allows text to be added to the query as well). Each search result will be a website with text and image attributes (for simplicity, webpage title and at most one image) More generally, I have a search system - whatever we're trying to search for has text and an image associated with it…  ( 11 min )
    [D] How Does Your Organization Approach Machine Learning Projects Phase by Phase?
    How does the development process of a Machine Learning project unfold phase-by-phase within your organization? Could you please specify the type of organization you are, the time spent on each phase, as well as any aspects you consider to be weak or fundamental? It would also be great if you could share any tips or tricks you've learned that have changed your perspective. submitted by /u/Spiritual_Narwhal649 [link] [comments]  ( 9 min )
    [P] Rubik's Cube Square Detection
    Hello everyone, I am trying to detect the 9 squares of a face of a Rubik’s Cube through a camera. The idea is that I want to use my computer camera and tell the user to show all the Rubik’s Cube faces and read the faces so I can feed it to a solver. Here are the steps I have tried so far: Sharpened square edges Obtained binary image and removed noise Detected and extracted squares Some methods I used were using different blurs and cv functions but nothing worked. Sometimes, it can get all 9 squares but sometimes it doesn't. There also seems to be a difference for different colors; for example; the model can detect green squares easier than yellow squares. Can anyone provide advice as to how I can detect the squares on the face? ​ https://preview.redd.it/1ht9f4h31wqb1.png?width=2180&format=png&auto=webp&s=32d23515a43406c0f8828e6790ad71e754b0ab80 submitted by /u/uglyboi34 [link] [comments]  ( 9 min )
  • Open

    DynIBaR: Space-time view synthesis from videos of dynamic scenes
    Posted by Zhengqi Li and Noah Snavely, Research Scientists, Google Research A mobile phone’s camera is a powerful tool for capturing everyday moments. However, capturing a dynamic scene using a single camera is fundamentally limited. For instance, if we wanted to adjust the camera motion or timing of a recorded video (e.g., to freeze time while sweeping the camera around to highlight a dramatic moment), we would typically need an expensive Hollywood setup with a synchronized camera rig. Would it be possible to achieve similar effects solely from a video captured using a mobile phone’s camera, without a Hollywood budget? In “DynIBaR: Neural Dynamic Image-Based Rendering”, a best paper honorable mention at CVPR 2023, we describe a new method that generates photorealistic free-viewp…  ( 92 min )
    Re-weighted gradient descent via distributionally robust optimization
    Ramnath Kumar, Pre-Doctoral Researcher, and Arun Sai Suggala, Research Scientist, Google Research Deep neural networks (DNNs) have become essential for solving a wide range of tasks, from standard supervised learning (image classification using ViT) to meta-learning. The most commonly-used paradigm for learning DNNs is empirical risk minimization (ERM), which aims to identify a network that minimizes the average loss on training data points. Several algorithms, including stochastic gradient descent (SGD), Adam, and Adagrad, have been proposed for solving ERM. However, a drawback of ERM is that it weights all the samples equally, often ignoring the rare and more difficult samples, and focusing on the easier and abundant samples. This leads to suboptimal performance on unseen data, espe…  ( 92 min )
  • Open

    Accenture creates a Knowledge Assist solution using generative AI services on AWS
    This post is co-written with Ilan Geller and Shuyu Yang from Accenture. Enterprises today face major challenges when it comes to using their information and knowledge bases for both internal and external business operations. With constantly evolving operations, processes, policies, and compliance requirements, it can be extremely difficult for employees and customers to stay up […]  ( 8 min )
    Speed up your time series forecasting by up to 50 percent with Amazon SageMaker Canvas UI and AutoML APIs
    We’re excited to announce that Amazon SageMaker Canvas now offers a quicker and more user-friendly way to create machine learning models for time-series forecasting. SageMaker Canvas is a visual point-and-click service that enables business analysts to generate accurate machine learning (ML) models without requiring any machine learning experience or having to write a single line of code. SageMaker […]  ( 7 min )
    Robust time series forecasting with MLOps on Amazon SageMaker
    In the world of data-driven decision-making, time series forecasting is key in enabling businesses to use historical data patterns to anticipate future outcomes. Whether you are working in asset risk management, trading, weather prediction, energy demand forecasting, vital sign monitoring, or traffic analysis, the ability to forecast accurately is crucial for success. In these applications, […]  ( 10 min )
    Create a Generative AI Gateway to allow secure and compliant consumption of foundation models
    In the rapidly evolving world of AI and machine learning (ML), foundation models (FMs) have shown tremendous potential for driving innovation and unlocking new use cases. However, as organizations increasingly harness the power of FMs, concerns surrounding data privacy, security, added cost, and compliance have become paramount. Regulated and compliance-oriented industries, such as financial services, […]  ( 13 min )
    Beyond forecasting: The delicate balance of serving customers and growing your business
    Companies use time series forecasting to make core planning decisions that help them navigate through uncertain futures. This post is meant to address supply chain stakeholders, who share a common need of determining how many finished goods are needed over a mixed variety of planning time horizons. In addition to planning how many units of […]  ( 11 min )
    Announcing New Tools to Help Every Business Embrace Generative AI
    From startups to enterprises, organizations of all sizes are getting started with generative AI. They want to capitalize on generative AI and translate the momentum from betas, prototypes, and demos into real-world productivity gains and innovations. But what do organizations need to bring generative AI into the enterprise and make it real? When we talk […]  ( 13 min )
  • Open

    How will the Big Data market evolve in the future?
    Big data has been around for some time now, becoming a more or less common concept in business. However, recent developments in AI technology have shaken up an already volatile field, inviting us to reconsider our projections of how the big data market will look in the future. We can already see the signs that… Read More »How will the Big Data market evolve in the future? The post How will the Big Data market evolve in the future? appeared first on Data Science Central.  ( 22 min )
  • Open

    Kicking Games Up a Notch: Startup Sports Vision AI to Broadcast Athletics Across the Globe
    Pixellot is scoring with vision AI — making it easier for organizations to deliver real-time sports broadcasting and analytics to viewers across the globe. A member of the NVIDIA Metropolis vision AI partner ecosystem, the company based near Tel Aviv offers an AI-powered platform that automates the capturing, streaming and analysis of sporting events. It’s Read article >  ( 7 min )
    V for Victory: ‘Cyberpunk 2077: Phantom Liberty’ Comes to GeForce NOW
    The wait is over. GeForce NOW Ultimate members can experience Cyberpunk 2077: Phantom Liberty on GOG.com at full GeForce RTX 4080 quality, with support for NVIDIA DLSS 3.5 technology. It’s part of an action-packed GFN Thursday, with 26 more games joining the cloud gaming platform’s library, including Quake II from id Software. A New Look Read article >  ( 8 min )
  • Open

    AI Frontiers: Measuring and mitigating harms with Hanna Wallach
    Powerful large-scale AI models like GPT-4 are showing dramatic improvements in reasoning, problem-solving, and language capabilities. This marks a phase change for artificial intelligence—and a signal of accelerating progress to come.    In this Microsoft Research Podcast series, AI scientist and engineer Ashley Llorens hosts conversations with his collaborators and colleagues about what these models—and the […] The post AI Frontiers: Measuring and mitigating harms with Hanna Wallach appeared first on Microsoft Research.  ( 29 min )

  • Open

    How can AI recreate the lack of information?
    Hey there! Are there guys here who possess a strong grasp of AI neural network logic? ​ I've extracted a character from an anime scene using a mask, and saved it as a PNG sequence which contains solely the anime character along with an alpha (transparent) background. ​ I'm curious about how the Flowframes neural network can recreate the background that was originally behind the character but removed by the mask. It's impossible since the PNG images don't have that background info. ​ Can anyone explain how this works? ​ Attachments: - Image #1: https://preview.redd.it/z2bypfkstvqb1.png?width=1920&format=png&auto=webp&s=c534167c5ae4129c04f9b8b2fbca3bac350a1d4a - Image #2: https://preview.redd.it/x5kkzs2ttvqb1.png?width=1920&format=png&auto=webp&s=6838d7ca5e1e4f19ba46c04750fdaea537a787f0 (Don't mind the black background in the thumbnails, it's a bug, there's actually a transparent background) ​ * Flowframes is a app that utilizes advanced AI frameworks to interpolate videos in order to increase their framerate in the most natural looking way possible. submitted by /u/drkysqrl [link] [comments]  ( 9 min )
    (Pt. 2) Inductive Logic Programming with LNN's
    submitted by /u/Neurosymbolic [link] [comments]  ( 9 min )
  • Open

    Graph Feature vector (embedding) [D]
    Hey all, I’m trying to do a regression algorithm for a dataset where I have a graph for each patient I have representing a location in their brain from MRI images. Right now, I don’t have a ton of data, so I’m looking for some way to take each graph I have and get a feature vector for it to put into a regression algorithm. So for 100 patients, I have 100 graphs, I’d like to have 100 feature vectors representing each patients graph. My issue is trying to find some algorithm that takes in the entire graph and outputs a single feature vector. I’ve been looking at some libraries but they all seem wildly scattered. I don’t want to grab a bunch of nose embeddings and do some elementary merge of them, like an average or sum, etc. Any help in pointing me to some Python libraries that can help me do this, or algorithms, or anything. Thank you so much. submitted by /u/kaleb7589 [link] [comments]  ( 9 min )
    Normalization in VAE[D]
    Normalization in VAE[D] Am training a variational auto encoder. First I tried with batch normalization before I send the data to the network and someone probably wisely pointed out that it's not correct. If I don't use batch norm then my training fails due to numerical instability. I then tried scaling my data before hand using standard scaler from sklewrn. And now my training works. Is this reasonable? Any other thoughts? submitted by /u/Global-Gene2392 [link] [comments]  ( 9 min )
    [P] Predicted stock data with TensorFlow is very different from actual data
    I'm following a YouTube video to create a simple machine learning model to predict stock prices. I have to reshape my prediction data so it works with inverse_transform, but in the video he doesn't do this. If I don't reshape it I get an error, but I think when I do reshape it it messes with the data. The predicted values are all very similar. I've tried messing with epoch and batch sizes, and changing other metrics like prediction_days, but nothing has worked. This is what the prediction data looks like when plotted, and this is what it looks like when printed. Does anyone know what could be causing this? Here's my code submitted by /u/darkshadowtrail [link] [comments]  ( 9 min )
    New subreddit rule idea [D]
    This subreddit will continue to die if it doesn't foster discussion of the latest research and reduce low-quality posts. However, making a judgement as to what is or is not low-quality is time-consuming and subjective -- not something the mods should be doing. To this end, I had the following new rule idea: If it's your first time at Fight If it's your first post in this subreddit, it needs to be a link to arxiv (Or, more generally, the number of your non-arxiv posts cannot exceed the number of your arxiv posts) All arxiv posts must be standard links to the abstract page (to catch reposts and to connect discussions of the same paper in different subreddits) An arxiv post must be a paper you've read yourself, and you should post a comment describing what you liked and DIDN'T like about it (Let the airing of grievances begin! I think this will help seed the discussion, which is really the raison d'être of this subreddit) If the post or the comment get downvoted, they do not count. What do you think? Will this help steer this subreddit in the right direction? Is this enforceable? submitted by /u/we_are_mammals [link] [comments]  ( 9 min )
    [D] How feasible is it to complete a course.
    Hi I am a physicist (1st year in masters) and I decided to take NN class (for cs students). I have a decent experience with python but I have never done low level coding. The class project requires a C++ implementation of NN with back propagation algorithm. I am quite confident in my learning ability, nonetheless, do you guys think it is feasible for me to code such a project in 13 weeks (I also have other subjects and cant just spend all my time on this)? Thanks submitted by /u/merimace [link] [comments]  ( 9 min )
    [P][D] Need Guidance on Building a Chatbot like ChatGPT for University Data - Help!
    Hey fellow Redditors, I find myself in quite a situation and could use some guidance. Recently, I introduced my professor to privateGPT and demonstrated its capabilities using a small set of college data. To my delight, he was impressed and has now tasked me with researching and developing a ChatGPT-like chatbot, but with access to our university's extensive data. Here's where I need your help: my professor wants this chatbot to be hosted on our university's systems due to privacy concerns, which means I can't use ChatGPT's API. I've been given access to Sol HPC, but I'm finding it quite confusing to get started. I'm looking for advice, tips, or any resources that can help me embark on this journey. Has anyone here worked on a similar project, or does anyone have experience with Sol HPC or building chatbots with local data sources? Any guidance or insights would be greatly appreciated! Thank you in advance for your help. This project means a lot to me, and I want to make sure I'm heading in the right direction. submitted by /u/ssshankyyy [link] [comments]  ( 9 min )
    [R] UNC Researchers Present VideoDirectorGPT: Using AI to Generate Multi-Scene Videos from Text
    Generating coherent videos spanning multiple scenes from text descriptions poses unique challenges for AI. While recent progress enables creating short clips, smoothly transitioning across diverse events and maintaining continuity remains difficult. A new paper from UNC Chapel Hill proposes VIDEODIRECTORGPT, a two-stage framework attempting to address multi-scene video generation: Here are my highlights from the paper: Two-stage approach: first a language model generates detailed "video plan", then a video generation module renders scenes based on the plan Video plan contains multi-scene descriptions, entities/layouts, backgrounds, consistency groupings - guides downstream video generation Video generation module called Layout2Vid trained on images, adds spatial layout control and cross-scene consistency to existing text-to-video model Experiments show improved object layout/control in single-scene videos vs baselines Multi-scene videos display higher object consistency across scenes compared to baselines Competitive open-domain video generation performance maintained The key innovation seems to be using a large language model to plot detailed video plans to guide overall video generation. And the video generator Layout2Vid adds better spatial and temporal control through some clever tweaks. The separation of these tasks seems to matter. You can read my full summary here. There's a link to the repo there too. Paper link is here. submitted by /u/Successful-Western27 [link] [comments]  ( 9 min )
    survival analysis in matlab [project]
    survival analysis in matlab hi everyone one i'm doing a predictive algorithm to find DFS using Cox regression, i first used LASSO regression to select the predictive variables, now i'm using the c-index to evaluate the predictive accuracy, and it's always equals to 1 and I can't understand why(I tried to reduce the numbers of variables just to see if it could change but it didn't change).Also, i'm working on censored date of course. can someone help me understand what I'm doing wrong? submitted by /u/bl4s3159 [link] [comments]  ( 9 min )
    [R] Can you help me validate my kmeans calculator for tensorflow faster rcnn model config ?
    My annotations are in pascal voc format. Below is a calculator I am testing . Not sure if I am calculating the scale and aspect ratios correctly. Please help. import os import xml.etree.ElementTree as ET import numpy as np from sklearn.cluster import KMeans def compute_scales_and_aspect_ratios(directory, n_clusters, img_size): widths = [] heights = [] for filename in os.listdir(directory): if not filename.endswith('.xml'): continue fullname = os.path.join(directory, filename) tree = ET.parse(fullname) root = tree.getroot() for obj in root.iter('object'): xmlbox = obj.find('bndbox') w = float(xmlbox.find('xmax').text) - float(xmlbox.find('xmin').text) h = float(xmlbox.find('ymax').text) - float(xmlbox.find('ymin').text) widths.append(w) heights.append(h) widths = np.array(widths) / img_size[1] # Normalize by image width heights = np.array(heights) / img_size[0] # Normalize by image height scales = np.sqrt(widths * heights).reshape(-1, 1) aspect_ratios = (widths / heights).reshape(-1, 1) kmeans_scales = KMeans(n_clusters=n_clusters, random_state=0).fit(scales) kmeans_aspect_ratios = KMeans(n_clusters=n_clusters, random_state=0).fit(aspect_ratios) return kmeans_scales.cluster_centers_, kmeans_aspect_ratios.cluster_centers_ directory = "path_to_top_folder/batch-1" n_clusters = 5 img_size = (640, 1024) scales, aspect_ratios = compute_scales_and_aspect_ratios(directory, n_clusters, img_size) print('Scales:', scales.flatten()) print('Aspect Ratios:', aspect_ratios.flatten()) ​ submitted by /u/dpadhy [link] [comments]  ( 9 min )
    [P] Any available datasets of children’s books or stories?
    I am looking for training data consisting of children’s stories and associated grade level. Does anyone know of any publicly available or paid datasets like this? submitted by /u/SpellboundLRN [link] [comments]  ( 9 min )
    Embers of Autoregression: Understanding Large Language Models Through the Problem They are Trained to Solve
    submitted by /u/cegras [link] [comments]  ( 9 min )
    [P] Tetris AI - Suggestions on direction to take from here? (One hot encoded dataset with 200 features)
    Hello! I'm working on a Tetris AI and am representing the 10x20 grid cubes with a one hot encoded dataset: https://www.kaggle.com/datasets/conlan/tetris-training-set-9262023 This means my data has 208 features (200 for the grid cubes being on/off, 7 for the "next shape" box, and 1 for the labeled best move. I currently have 9460 labeled samples and have done some preliminary fitting here: https://www.kaggle.com/code/conlan/tetris-ai?scriptVersionId=144388350 with a highest f1_macro score of 0.431090. Does anyone have suggestions for which direction to take from here to improve? Currently I see: Collect More Data Tune Hyperparameters Rework Features I'm hesitant to rework the features as that would require telling the model more specifics and would like to keep it abstract, but maybe 200 is crazy high? Or maybe <10k samples is too low and I should just keep collecting data? Thanks in advance! submitted by /u/conlanrios [link] [comments]  ( 9 min )
    [R] The Internal State of an LLM Knows When its Lying
    Paper - https://arxiv.org/abs/2304.13734 submitted by /u/MysteryInc152 [link] [comments]  ( 9 min )
    [D] Feature Transformation & Scaling
    ood morning everyone, I am currently reading the book of Mr. Burkov: Machine Learning Engineering. He talk about a step that might be helpful before training a ML model: Feature Scaling. Furthermore, he adds that before Feature Scaling, you might do Feature Transformation (Log, Square,...) in order to make your data look normal and have better models. How true do you think this statement is? Do you also transform your feature, and then scale them? How often do you do it? It is important for Regression or SVM, but do you do it also for other black box algorithms such as Random Forests? What are the best practices according to you? submitted by /u/dekozr [link] [comments]  ( 9 min )
    AAAI 24 [Discussion]
    So no discussions are going on about AAAI 2024, or have I just been unable to find any? Opening this regarding Phase 1-2 and Results discussions if anyone wants to discuss. If there already is a thread, share! For an opening question, any idea about what percentages are rejected in desk rejection, phase 1 and finally phase 2? (Roughly of course) submitted by /u/atharvandogra [link] [comments]  ( 9 min )
    [D] GPT2 diagrams are wrong
    so if u go check the source code for gpt2 u can clearly see that the nrom happens inside the attention and mlp layers. and that the add is separate. this is in the official openai github and is relatively easy to read:https://github.com/openai/gpt-2/blob/master/src/model.py#L123-L130 (thx KingsmanVince) ​ for some reason all the online materials are saying that there is a full norm layer before the mlp instead of inside of it submitted by /u/rejectedlesbian [link] [comments]  ( 9 min )
    [D] ONNX or torchlib for on device training in C++
    Hi, Recently I am trying to reimplement a deep learning based object tracking in C++. However, the whole pipeline involve online training and weight update. Is it possible to do the training for ONNX model and using cuda as accelerator in C++? If yes, then how is the training speed (BP/update)compare to torchlib? I personally strongly prefer onnx, cuz it is much easier to deploy… submitted by /u/Independent_Bet1268 [link] [comments]  ( 9 min )
    [D] The quality of this sub
    Mods finally commented The only time that mods were active is when they removed the cat meme. It has been a month since that. Let see what mods have done to improve this sub. Here are some of the other posts obviously rule-breaking or off-topic that mods do NOT remove: A person asking for help with their motherboard A person asking about statistics A person asking for machine learning roadmap Another asking-for-roadmap post ... the list goes on with absolute beginner questions, and low-quality posts. All these posts were written in less than 1 week. As we can see, mods do nothing. They only remove posts that calling out them. Here are posts that people discuss the status of this sub: 17 Sep 2023 2 Sep 2023 1 Aug 2023 Questions for mods: where are you when people complain? Why do you only show up when someone call you out? We have few options: Report the mods and the sub for unmoderated (see this 1 and this 2) Find other communities Gatekeep harder, tell people to go to r/learnmachinelearning r/MLQuestions r/cscareerquestions r/languagetechnology submitted by /u/March8365 [link] [comments]  ( 9 min )
    [D] Model release v0.1 from MistralAI
    EDIT: They released the model weights on HF (https://huggingface.co/mistralai) under a Apache 2.0 License. They also updated their website with documentation on how to use/run : https://docs.mistral.ai Note: I am not affiliated with Mistral AI. ​ Via their Twitter X account : magnet:?xt=urn:btih:208b101a0f51514ecf285885a8b0f6fb1a1e4d7d&dn=mistral-7B-v0.1&tr=udp%3A%2F%http://2Ftracker.opentrackr.org%3A1337%2Fannounce&tr=https%3A%2F%http://2Ftracker1.520.jp%3A443%2Fannounce ​ https://preview.redd.it/0o46ls925rqb1.png?width=1306&format=png&auto=webp&s=7ff7ca3a510577e9ecdaa3c9ccb7ef763acc0780 submitted by /u/Fluid-Age-9266 [link] [comments]  ( 9 min )
    [D]Finetune t5 for classification but not seeing loss reduction
    I am wondering if any one runs into this before, i have finetuned flan-t5-xl for classification task by generating one token from decoder. The finetune process looks OK. I want to convert this into t5 encoder with a head to save memory. I am using huggingface T5ForSequenceClassification. However i am seeing loss not actually decrease but bounce around certain float value. What could be wrong? I have tried a few learning rates and other hyperparameters tuning. submitted by /u/Chen806 [link] [comments]  ( 9 min )
    [R] Microsoft Researchers Propose DIT Morality Test for LLMs To Quantify AI Moral Reasoning Abilities
    Researchers from Microsoft have just proposed using a psychological assessment tool called the Defining Issues Test (DIT) to evaluate the moral reasoning capabilities of large language models (LLMs) like GPT-3, ChatGPT, etc. The DIT presents moral dilemmas and has subjects rate and rank the importance of various ethical considerations related to the dilemma. It allows quantifying the sophistication of moral thinking through a P-score. In this new paper, the researchers tested prominent LLMs with adapted DIT prompts containing AI-relevant moral scenarios. Key findings: Large models like GPT-3 failed to comprehend prompts and scored near random baseline in moral reasoning. ChatGPT, Text-davinci-003 and GPT-4 showed coherent moral reasoning with above-random P-scores. Surprisingly, the smaller 70B LlamaChat model outscored larger models in its P-score, demonstrating advanced ethics understanding is possible without massive parameters. The models operated mostly at intermediate conventional levels as per Kohlberg's moral development theory. No model exhibited highly mature moral reasoning. I think this is an interesting framework to evaluate and improve LLMs' moral intelligence before deploying them into sensitive real-world environments - to the extent that a model can be said to possess moral intelligence (or, seem to possess it?). Here's a link to my full summary with a lot more background on Kohlberg's model (had to read up on it since I didn't study psych). Full paper is here submitted by /u/Successful-Western27 [link] [comments]  ( 9 min )
  • Open

    Any good AI newsletters? I'm tired
    Any good AI (low-hype) newsletters/blogs? That's ideally sent <= 4 times a month? I'm tired of the amount of AI news I have to go through daily just to keep up. submitted by /u/onteri [link] [comments]  ( 9 min )
    AI is taking jobs away from Chinese streamers and online retailers
    AI-generated deepfake clones of Chinese livestream influencers are becoming popular on e-commerce platforms. These clones can work 24/7 and help brands sell their products without the need for human streamers. Chinese startups and tech companies are offering the service of creating these deepfake avatars for a cost of around $1,000. The technology has evolved over the years, with the need for training videos decreasing from 30 minutes to just one minute. The AI clones can mimic the movements and speech of human streamers, making them an affordable and efficient alternative for smaller brands. Source : https://www.technologyreview.com/2023/09/19/1079832/chinese-ecommerce-deepfakes-livestream-influencers-ai/ submitted by /u/NuseAI [link] [comments]  ( 9 min )
    Using language models for code generation works better when limited to a specific domain
    Automatic code generation has always been an integral part of programming: compilers, synthesis tools, convertors, etc. are examples of classic code generators. Now, with such powerful LLMs at hand, it is only natural to try to find new ways to generate codes. The question is: are LLMs the right tool for code generation? There are two sides to code generation: (1) understanding the intent (a.k.a. capturing the spec) (2) writing the code. LLMs are great for (1), but not so good for (2). This is an example of using LLM for general-domain code generation: https://github.com/RoboCoachTechnologies/GPT-Synthesizer You can see that the main focus here is to properly capture the spec, and that's where LLMs shine. LLMs solution for a general-domain code generation may not be complete or optimized. It is always easier to break the problem and solve code generation in a specific domain. Here you can see how much better and cleaner the output of code generation can be when it is limited to a specific domain (robotics domain, ROS in particular, in this case): https://github.com/RoboCoachTechnologies/ROScribe What are your thoughts on using LLMs for code generation? submitted by /u/RoboCoachTech [link] [comments]  ( 9 min )
    How to stop AI deepfakes from sinking society — and science
    submitted by /u/waozen [link] [comments]  ( 9 min )
    Even the CIA is developing an AI chatbot
    The CIA is developing an AI chatbot similar to ChatGPT to help US intelligence agencies sift through large amounts of information. The program will train on publicly available data and provide sources for agents to confirm their validity. The tool will allow agents to look up information, ask follow-up questions, and summarize daunting masses of data. The exact nature of what constitutes 'public data' could spark privacy issues. The tool will be distributed to the 18-agency US intelligence community, but not to lawmakers or the public. Source : https://www.engadget.com/even-the-cia-is-developing-an-ai-chatbot-192358767.html submitted by /u/NuseAI [link] [comments]  ( 9 min )
    UNC Researchers Present VideoDirectorGPT: Using AI to Generate Multi-Scene Videos from Text
    Generating coherent videos spanning multiple scenes from text descriptions poses unique challenges for AI. While recent progress enables creating short clips, smoothly transitioning across diverse events and maintaining continuity remains difficult. A new paper from UNC Chapel Hill proposes VIDEODIRECTORGPT, a two-stage framework attempting to address multi-scene video generation: Here are my highlights from the paper: Two-stage approach: first a language model generates detailed "video plan", then a video generation module renders scenes based on the plan Video plan contains multi-scene descriptions, entities/layouts, backgrounds, consistency groupings - guides downstream video generation Video generation module called Layout2Vid trained on images, adds spatial layout control and cross-scene consistency to existing text-to-video model Experiments show improved object layout/control in single-scene videos vs baselines Multi-scene videos display higher object consistency across scenes compared to baselines Competitive open-domain video generation performance maintained The key innovation seems to be using a large language model to plot detailed video plans to guide overall video generation. And the video generator Layout2Vid adds better spatial and temporal control through some clever tweaks. The separation of these tasks seems to matter. You can read my full summary here. There's a link to the repo there too. Paper link is here. submitted by /u/Successful-Western27 [link] [comments]  ( 9 min )
    Cyberpunk Multiverse
    I created this cyberpunk inspired short using Midjourney to create the pictures, RunwayML to animate them, and then edit them together using CapCut on iOS. I know the animation is still in early stages, but what do you think? Do you think we could have full length movies in a couple years? submitted by /u/Exitium_Maximus [link] [comments]  ( 9 min )
    A Simple Checklist for Self-Evaluating Prompt Quality
    How do you evaluate the quality of your prompt outputs? Here's a handy checklist. Let's have a look! You can also join r/PromptWizards to find more tutorials and prompts! Part 1: Understanding AI's Understanding You've presented a prompt to your AI, the next questions are: Has the AI accurately grasped the context? If not, how can I make sure the LLM steers my context better, should I be more direct and clear in my prompt? Can I be less negative (shows to perform less) and be more guiding to the LLM? Do the responses directly address the question or topic? Was my query and task/instruction clearly detailed in enough depth that the LLM understood what I expect? Are there any contradictions between different responses to the same prompt? If I run my prompt multiple times, i…  ( 10 min )
    OpenAI’s GPT-4 With Vision Still Has Flaws, Reveals Paper
    OpenAI's much-touted model GPT-4, lauded for its multimodal abilities, including advanced image recognition, still has significant flaws. These glitches range from inventing facts to misinterpreting chemicals' images and hate symbols, according to a new paper from OpenAI. To stay ahead of AI developments, look here first. https://preview.redd.it/seg5x4zn3uqb1.png?width=1108&format=png&auto=webp&s=635a6c58cf6255f62d8eae3077678864e5b0e248 Unintended GPT-4V behaviors GPT-4V has a tendency to hallucinate or invent facts with unwarranted confidence. The model struggles to make correct inferences, sometimes creating fictional terms by wrongly combining text strings. It misinterprets certain symbols of hate and can give incorrect answers in the context of medical imaging. OpenAI’s mitigation strategies OpenAI has implemented various safeguards to prevent GPT-4V's misuse, such as breaking CAPTCHAs or using images to infer personal details. The company insisted that GPT-4V is not to be used for identifying dangerous chemicals from image structures. OpenAI acknowledged it has a long way to go in refining the model and is working on it. Discrimination and bias When OpenAI’s production safeguards are disabled, GPT-4V displays bias against certain sexes and body types. The paper reported offensive responses related to body positivity when prompted by an image of a woman in a bathing suit. (source) P.S. If you like this kind of analysis, I write a free newsletter that dissects the most impactful AI news and research. 1000s of professionals from Google, Meta, and OpenAI read it daily. submitted by /u/AIsupercharged [link] [comments]  ( 9 min )
    New Bing browser, same Bing results. Score was 10-27 btw.
    submitted by /u/degrudv [link] [comments]  ( 9 min )
    Are language Models being nerfed?
    In using Ai and asking it to do simple tasks like "explain this in more simple terms" or asking it to make flashcards for me in a certain format, I am really convinced that language models, (bard and openai specifically) are being nerfed. They cannot understand simple instructions as well anymore. I had a paragraph of information for one of my classes that I wanted it to make more straightforward for me before I actually went to class the next day. I spent like 30 minutes trying to get it to do that and eventually just ended up giving up. Why dont language models feel as sharp as they did say a year ago? I wish I had more examples to share. Am I the only one who's noticed this? submitted by /u/Bojof12 [link] [comments]  ( 9 min )
    Looking For The Best AI Art Generator? Look No Further! (Definitive Guide for 2023)
    submitted by /u/Senior_tasteey [link] [comments]  ( 9 min )
    Looking to change my own voice for audio production
    Hi all – I’m new to this sub-Reddit, so hopefully I’m in the right place. I am working on an audio production that will span multiple episodes and hopefully multiple seasons. It will require many characters, ranging in gender, age, ethnicity, etc. I am a decent voice actor and can do many of the roles myself, but some of them I cannot fake using my voice alone. My budget is very limited, so I was hoping to find some type of software I can change my voice for the production. This can be during the recording process, or after recording… As long as it gets the job done, and makes me sound like someone else entirely. Does anybody know of a software that can achieve this? Most of the software I found either specifically designed to change users voice on the spot and is aimed at gamers changing your voice for live streams or in-game chats. I’m also on a Mac, which I know will be limiting. I’m having a hard time finding something I can use. Any suggestions will be helpful. Thank you! EDIT: To clarify, I don’t want to just change my voice to sound different in general. I want to specifically sound like a woman, an elder man, or someone of a different ethnic background. Those are just a few examples. submitted by /u/nopetoocreepy [link] [comments]  ( 9 min )
    I asked AI to create a religion and this is what it created (its weird)
    submitted by /u/GABIBBOPAZZOCINESE [link] [comments]  ( 9 min )
    My uneducated opinion on where we are going with AI - video essay.
    submitted by /u/rattuspuer [link] [comments]  ( 9 min )
    This Article Was Written Half By A Human... And Half By AI.
    submitted by /u/Senior_tasteey [link] [comments]  ( 9 min )
    CGPT-4, describe what the US would look like today had the insurrection suceeded.
    Predicting alternative historical scenarios is always fraught with complexity and uncertainty, especially concerning highly politically charged topics. However, let's entertain the hypothetical situation where the insurrection following the 2020 U.S. presidential election had succeeded, with the understanding that this is speculative reasoning. In this alternate reality, the immediate consequences would be a constitutional crisis of unprecedented proportions. Faith in democratic institutions would be severely eroded, both domestically and internationally. Trump's retaining power in this manner would spark widespread protests, perhaps more intense and larger than those seen in the summer of 2020. The unrest would likely lead to a governmental response that could be more authoritarian, poss…  ( 10 min )
    Everest.
    submitted by /u/ApprehensiveChair460 [link] [comments]  ( 9 min )
    AI art generator
    Hey, so I'm hoping to get a bit of help with finding an art generator to play around with, my only concern is the giving my entire personal information away, are there any apps for Android whereby the tos aren't crazy invasive by any chance? submitted by /u/Fluffy_Discount_9692 [link] [comments]  ( 9 min )
    Deepfake election risks trigger EU call for more generative AI safeguards
    The European Union is urging the implementation of more safeguards against the risks posed by generative AI tools to free and fair debate in democratic societies, especially during elections. The EU's values and transparency commissioner has highlighted the potential threat of AI-generated disinformation to elections and called for platforms to be vigilant and provide efficient safeguards. Mainstream platforms have made initial efforts to address the risks by implementing safeguards to inform users about the synthetic origin of content posted online. The EU commissioner is meeting with representatives from OpenAI to discuss the issue. An incoming pan-EU AI regulation, the EU AI Act, is expected to make user disclosures a legal requirement for generative AI technologies. The EU's voluntary anti-disinformation Code has 44 signatories, including major social media and search platforms, as well as entities from the ad industry and civil society organizations. Google, one of the signatories, has published a report discussing the social impacts of AI and its commitment to developing technology responsibly. Google Search has published guidance on AI-generated content and plans to integrate new innovations in watermarking, metadata, and other techniques into its generative models. The EU's Code of Practice on Disinformation is seen as a stop-gap measure until the EU AI Act is adopted and mandatory deepfake disclosures are enforced. Source : https://techcrunch.com/2023/09/26/generative-ai-disinformation-risks/ submitted by /u/NuseAI [link] [comments]  ( 9 min )
    One-Minute Daily AI News 9/26/2023
    Chinese media reported that BIDU’s Baidu AI Cloud has released ACE 3.0, an intelligent traffic solution comprehensively restructured using a foundation model. ACE means Autonomous Driving, Connected Road, and Efficient Mobility respectively.[1] BCG consultants solving business problems with OpenAI’s GPT-4 performed 23% worse than those without it, new study finds.[2] CIA Builds Its Own Artificial Intelligence Tool in Rivalry With China.[3] Facebook parent is developing bots with personalities, including a ‘sassmaster general’ robot that answers questions.[4] Sources: [1] http://www.aastocks.com/en/stocks/news/aafn-con/NOW.1296238/popular-news/AAFN [2] https://finance.yahoo.com/news/bcg-consultants-solving-business-problems-081532840.html [3] https://www.bloomberg.com/news/articles/2023-09-26/cia-builds-its-own-artificial-intelligence-tool-in-rivalry-with-china#xj4y7vzkg [4] https://www.wsj.com/tech/ai/meta-ai-chatbot-younger-users-dab6cb32 submitted by /u/Excellent-Target-847 [link] [comments]  ( 9 min )
    How do I turn images into landscapes?
    I was wonderhow someone made the destroyed building look like a cat. Anyone know how to do this? submitted by /u/Agitated-Court-2871 [link] [comments]  ( 9 min )
    Getting an A6000. What interesting things can I do with it?
    As title, I’ll be getting my hands on a couple of decent GPUs, including an old A6000, and am excited for everything its 48GB of VRAM unlocks. What’s something interesting I should do with it? A few things off the top of my head: See what crazy things stable diffusion generates at an insane resolution (how high of a resolution would 48GB allow?) Train good Dreambooth models (or what newer methods are there for style and object training?) Run and compare various open-source LLMs (should be able to run 70b models? Generate something of decent length with MusicGen Gaussian Splatting Distribute voice recognition, TTS, audio2face, LLM, and rendering across 2 or 3 machines to create a realistic virtual human (suggestions for excellent TTS would be appreciated) What other interesting models are out there to experiment with? submitted by /u/DsDman [link] [comments]  ( 9 min )
    Microsoft Researchers Propose AI Morality Test for LLMs in New Study
    Researchers from Microsoft have just proposed using a psychological assessment tool called the Defining Issues Test (DIT) to evaluate the moral reasoning capabilities of large language models (LLMs) like GPT-3, ChatGPT, etc. The DIT presents moral dilemmas and has subjects rate and rank the importance of various ethical considerations related to the dilemma. It allows quantifying the sophistication of moral thinking through a P-score. In this new paper, the researchers tested prominent LLMs with adapted DIT prompts containing AI-relevant moral scenarios. Key findings: Large models like GPT-3 failed to comprehend prompts and scored near random baseline in moral reasoning. ChatGPT, Text-davinci-003 and GPT-4 showed coherent moral reasoning with above-random P-scores. Surprisingly, the smaller 70B LlamaChat model outscored larger models in its P-score, demonstrating advanced ethics understanding is possible without massive parameters. The models operated mostly at intermediate conventional levels as per Kohlberg's moral development theory. No model exhibited highly mature moral reasoning. I think this is an interesting framework to evaluate and improve LLMs' moral intelligence before deploying them into sensitive real-world environments - to the extent that a model can be said to possess moral intelligence (or, seem to possess it?). Here's a link to my full summary with a lot more background on Kohlberg's model (had to read up on it since I didn't study psych). Full paper is here submitted by /u/Successful-Western27 [link] [comments]  ( 9 min )
  • Open

    Online Training from Demonstrations
    I would like to embark on online training for an F1TENTH racing car, starting from scratch and leveraging demonstration data. Currently, it appears that DDPGfD is a promising approach. Does anyone have any research papers they can recommend or suggestions on how to get started? submitted by /u/anointedninja [link] [comments]  ( 9 min )
    "What If the Robots Were Very Nice While They Took Over the World?" (reflections on CICERO & _Diplomacy_)
    submitted by /u/gwern [link] [comments]  ( 9 min )
    Advice on getting started with a career in reinforcement learning
    Reinforcement learning has grabbed my interest pretty firmly and been my focus for the 3 months or so. I spend most of my time working in python , rust, and now mojo. Not an expert yet but my coding skills are improving. I have no degree and have taught myself most of what I know. That part is why Im looking for advice from you all. Practically every job post Ive seen has college requirements. Is it unlikely to get hired without a degree? Additional information: I'm currently working on projects for github but those arent quite done. My main interest is related to RL in game design. Applications of distributional RL in action dense environments and VR. Currently using Godot engine the most and have used pytorch, openai gym, and tensorflow (to a lesser degree). The abstract concepts of neural networks comes easy to me and Ive been following basic neurology as well. submitted by /u/SchrodingersCog [link] [comments]  ( 9 min )
    How to modify DQN to not overfit for action that concludes episode
    Edit: I may be jumping the gun here but I think I figured it out (looks good so far). I give the episode reward for every action EXCEPT the "end early" action, now I will need to give some boost for shorter episodes to achieve the desired effect :) I feel like I'm experiencing déjà vu, posing another DQN-related question. But, here's my issue: I've set up an environment where an agent can interact for 40 steps or choose to end the interaction early with a specific action. The catch is that the reward is only given at the end of the episode, which seems to be leading the agent to strongly favor the "end early" action. Despite all other steps getting a reward of 0, I assumed the long-term reward estimate, V(s_{t+1}), would mitigate this, but the agent still heavily gravitates towards ending the episode early. Attempted Solutions: Distributed the end-of-episode reward across all prior actions taken by the agent. Considering: Replacing the "end early" action with a "do nothing" action, allowing the episode to always play out in full. However, this seems like it could introduce additional computational costs and noise. Has anyone encountered a similar problem? I'd appreciate any advice or recommendations. submitted by /u/Vae94 [link] [comments]  ( 9 min )
    DeepMind built an excellent Stratego bot. Can I get an ELI5 of the underlying technologies, DeepNash and R-NaD?
    I learned about DeepNash and R-NaD yesterday. I read the Deepmind article, Science paper, and the source code of rnad.py. But I don't think I understand it! Part of this is that they didn't define all the terms and Greek they use in the paper, and part of it is that I don't have academic ML experience. Below is my attempt to summarize the paper in non-academic terms. I'm trying to show that I did my homework, and also I'm trying to invoke Godwin's Law in the hopes that someone will come along and correct me. Here goes: Naïve reinforcement learning doesn't work with simultaneous choice games such as matching pennies or Rock-Paper-Scissors. In naïve RL, If I choose Rock as my move, my opponent chooses Scissors as their move, and I see that I won, that will reinforce a belief that Rock is a "good" move and Scissors is a "bad" move. But this isn't true! This means that, during selfplay, a naïve RL agent will just cycle through strategies, as the timestep-(τ_n) agent learns how to beat the timestep-(τ_n-1) agent. The agent will never learn that RPS is a game about staying unpredictable! R-NaD fixes this by adjusting the reward function. I think "regularizing" is ML-academic speak for "adjusting". It adjusts the reward function in such a way that the agent will converge at a Nash equilibrium strategy. The paper's equation (1) describes how the regularization works. They didn't explain all the terms, though. I still don't know what a_i represents. But I think it corresponds to parts of the code like this line and this line. The key is that we're merging policies from multiple epochs and making sure that the current agent's move probabilities fare well against not only itself, but also against its previous two generations. They've proven that three generations is all you need to eventually converge to a Nash equilibrium. So... that's my understanding. Does anyone with actual ML experience want to weigh in? submitted by /u/lord_braleigh [link] [comments]  ( 10 min )
  • Open

    Re-imagining the opera of the future
    The iconic sci-fi opera “VALIS,” first composed by Professor Tod Machover in 1987, reboots at MIT for a new generation.  ( 11 min )
    From physics to generative AI: An AI model for advanced pattern generation
    Inspired by physics, a new generative model PFGM++ outperforms diffusion models in image generation.  ( 10 min )
  • Open

    A generative AI-powered solution on Amazon SageMaker to help Amazon EU Design and Construction
    The Amazon EU Design and Construction (Amazon D&C) team is the engineering team designing and constructing Amazon Warehouses across Europe and the MENA region. The design and deployment processes of projects involve many types of Requests for Information (RFIs) about engineering requirements regarding Amazon and project-specific guidelines. These requests range from simple retrieval of baseline […]  ( 13 min )
    MDaudit uses AI to improve revenue outcomes for healthcare customers
    MDaudit provides a cloud-based billing compliance and revenue integrity software as a service (SaaS) platform to more than 70,000 healthcare providers and 1,500 healthcare facilities, ensuring healthcare customers maintain regulatory compliance and retain revenue. Working with the top 60+ US healthcare networks, MDaudit needs to be able to scale its artificial intelligence (AI) capabilities to […]  ( 5 min )
  • Open

    DENZA Unwraps Smart Driving Options for N7 Model Lineup, Powered by NVIDIA DRIVE Orin
    DENZA, the luxury electric-vehicle brand and joint venture between BYD and Mercedes-Benz, is debuting new intelligent driving features for its entire N7 model lineup, powered by the NVIDIA DRIVE Orin system-on-a-chip (SoC). The N7 series was introduced earlier this year as a family of spacious five-seater SUVs for commuters looking to sport a deluxe EV Read article >  ( 5 min )
    The Fastest Path: Healthcare Startup Uses AI to Analyze Cancer Cells in the Operating Room
    Medical-device company Invenio Imaging is developing technology that enables surgeons to evaluate tissue biopsies in the operating room, immediately after samples are collected — providing in just three minutes AI-accelerated insights that would otherwise take weeks to obtain from a pathology lab. In a surgical biopsy, a medical professional removes samples of cells or tissue Read article >  ( 6 min )
    NVIDIA Works With NTT DOCOMO to Launch World’s First GPU-Accelerated 5G Network
    As generative AI sweeps across corporate boardrooms around the world, global telecommunications companies are exploring how to cost-effectively deliver many of these new AI applications to the edge over 5G and upcoming 6G networks. Telcos plan to deploy over 17 million 5G microcells and towers worldwide by 2025. Building, managing and optimizing this new infrastructure Read article >  ( 6 min )
  • Open

    Research Focus: Week of September 25, 2023
    Chunked prefills & decode-maximal batching boost LLM inference; DragNUWA combines text, image, and trajectory for fine-grained video content control; reconstructing images from human brain signals; structural inequalities in creator-audience relationships. The post Research Focus: Week of September 25, 2023 appeared first on Microsoft Research.  ( 9 min )
  • Open

    Circular coordinate art
    About three years ago I ran across a strange coordinate system in which familiar functions lead to interesting plots. The system is called “circular coordinates” but it is not polar coordinates. This morning I was playing around with this again. Here’s a plot of f(x) = x. And here’s a plot of f(x) = cos(8x). […] Circular coordinate art first appeared on John D. Cook.  ( 5 min )

  • Open

    [D] Implementation of ChatGPT-steered Editing Instructor for Customization of Abstractive Summarization
    I found the paper “ChatGPT-steered Editing Instructor for Customization of Abstractive Summarization” published in march and I was looking for information about the cost of training such a system. Have someone tried ? Is there some weights in the nature already trained for the instructor model ? I have found the GitHub associated with the paper but it obviously only contain the code for training but no information about approximate token used or anything like that. submitted by /u/Agreeable-Committee6 [link] [comments]  ( 9 min )
    [P] Interact with an OWL-ViT Object Detection Model
    We noticed a lot of people wanting to deploy computer vision models, so we built an interactive demo of OWL-ViT to show how it might be used by an end user when integrated into a product. OWL-ViT is a new object detection model from the team at Google Research. It allows you to identify an object in one image (the “query image”) and then find that same object in any number of target images. Here is the link to interact with an OWL-ViT model! submitted by /u/modelbit [link] [comments]  ( 9 min )
    Question about dataset [D]
    hey everyone, novice at ML and trying to do a project on my own. I am trying to predict the rainfall amount in inches for a given day. I’ve decided to make it a classification problem and predict the zone of rainfall as in 0-0.5 in inches or 1-1.5 inches. My data set has ~40,000 samples however i have noticed that 24,000 of them have 0.0 as the amount of rainfall. And a high percentage of the rest are very low like below 0.5 inch. I’m wondering if there’s still a way to create the type of model I had originally intended or not. Is there a way to reduce the size of my data set , specifically the amount of low values without losing important feature information? Thank you and any help is appreciated :) submitted by /u/RepeatResponsible499 [link] [comments]  ( 9 min )
    [D] Asus ROG Zephyrus vs Macbook Pro for ML (PhD Student)
    Hi all, I understand it all comes down to personal preference and that it is an old topic, but a bit advice would be welcome. My current workload consists of analyzing large medical records, medical images (upcoming work) with mainly PyTorch. Now I have direct and remote access to my personal lab pc which has configuration: core i9-9900K, 32 GB Ram, GTX 2080Ti 12 GB, Windows 11. Now I am planning to buy a laptop that would help with coursework, research paper reading and remote access to my lab PC. It should last at least 4/5 years (My current 5-year-old MSI laptop's hinge broke). I have the following laptops in mind with a budget of around $2000 14-inch Macbook Pro with 16 GB RAM and M2 PRO = $1999 ASUS ROG Zephyrus 15.6" WQHD 165Hz Gaming Laptop, AMD Ryzen 9 6900HS,16GB DDR5 4800Mhz RAM, 1TB SSD PCIe 4.0 Storage, NVIDIA GeForce RTX 3060 = $1400 submitted by /u/Furiousguy79 [link] [comments]  ( 9 min )
    Is Rust a thing in ML? [D]
    I've seeing some people saying thar python is for training models and rust is for deploying them. Is it a widespread practice or it's just a localized need for companie with "performance sensitive" models? submitted by /u/horace_desplein [link] [comments]  ( 9 min )
    [D] Announcing Boomerang - Vectara's new embedding model
    Happy to share Vectara's new state-of-the-art embedding model, called Boomerang. Embedding models were so far not too much in the spotlight relative to chat models like ChatGPT, but for Retrieval-augmented-generation applications, getting the best embedding model matters a lot. would love to hear what has been the experience of others in this respect - what embedding models have worked best so far with RAG? Blog post: https://vectara.com/introducing-boomerang-vectaras-new-and-improved-retrieval-model/ Hackernews: https://news.ycombinator.com/item?id=37661359 submitted by /u/ofermend [link] [comments]  ( 9 min )
    [R] Automated Quality Assurance for Object Detection Datasets
    Would you deploy a self-driving car model that was trained on images for which data annotators accidentally forgot to highlight some pedestrians? Errors in object detection examples found via cleanlab. Annotators of real-world object detection datasets often make such errors and many other mistakes. To avoid training models on erroneous data and save QA teams significant time, you can now use automated algorithms invented by our scientists. Our newest paper introduces Cleanlab Object Detection: a novel algorithm to assess label quality in any object detection dataset and catch errors (named ObjectLab for short). Extensive benchmarks show Cleanlab Object Detection identifies mislabeled images with better precision/recall than other approaches. When applied to the famous COCO dataset, Cleanlab Object Detection automatically discovers hundreds of mislabeled images, including errors where annotators mistakenly: overlooked an object that should’ve had a bounding box, sloppily drew a box in a poor location, or chose the wrong class label for an annotated object. We’ve open-sourced one line of code to find errors in any object detection dataset via Cleanlab Object Detection, which can utilize any existing object detection model you’ve trained. For those interested, you can check out the 5-minute tutorial to get started and the blog to read the details. submitted by /u/jonas__m [link] [comments]  ( 9 min )
    [R] 🤖🎸 Need directions to embed and query structured table data for a music recommendation system
    Hi there community, I hope everyone is doing well ::] I’m exploring ada-002 embedding model for building a recommendation system (along some other similarity search things like generating playlists), so naturally a lot of questions started to pop. But before goign deeper, let me explain what I am building and how the data is structured: Imagine a music app with song recommendations based on all the users history and musical metadata. Currently I have a table with a couple data on it just for tests - the users, the artists and the songs. Each of these columns have their own rows, for example song have genres, danceability, number of likes, etc. I am now implementing two more columns for history logs - a “history” (that will be related with users and songs) and a “session” (wich is a coll…  ( 12 min )
    [P] Where can I find Pre-Annotated images dataset
    I am trying to do an Object Detection project, Does anyone know where I can find Pre-Annotated image dataset submitted by /u/Nomadic-Foe-011 [link] [comments]  ( 9 min )
    [R][P][D] Scene Encoder like ViT L/14 from CLIP but for 3D Scenes
    I'm working on my thesis and want to perform 3D scene understanding and VQA. My scenes would be textured meshes (or pointclouds). My goal is not only to know the objects present in the scene but also the spatial relationships between them, like chair is in front of the couch, bottle is on the table etc. I want to know if there is a 3D scene encoder like the 2D image encoder ViT L/14 from CLIP. My search hasn't resulted much yet in this direction, but I have come across papers that render a 3D scene in multiple angles and then use 2D scene encoders on them. So I'd like to ask the community: Are there 3D scene encoders like CLIP ViT If not, is there's any other way that I can approach this problem. submitted by /u/Bluebird705 [link] [comments]  ( 9 min )
    [Research] Exciting New Paper on StyleGAN Domain Adaptation: StyleDomain - ICCV 2023
    Hey, fellow machine learning enthusiasts! AIRI researchers are thrilled to share some exciting news with you all. Our paper, "StyleDomain: Efficient and Lightweight Parameterizations of StyleGAN for One-shot and Few-shot Domain Adaptation", has been accepted to ICCV 2023! 🥳 Abstract: Domain adaptation of GANs is a problem of fine-tuning GAN models pretrained on a large dataset (e.g., StyleGAN) to a specific domain with few samples (e.g., painting faces, sketches, etc.). While there are many methods that tackle this problem in different ways, there are still many important questions that remain unanswered. In this paper, we provide a systematic and in-depth analysis of the domain adaptation problem of GANs, focusing on the StyleGAN model. We perform a detailed exploration of the most i…  ( 10 min )
    [D] What are some good AI tools to help you in your own 2D digital art. Softwares or apps that help you improve and speed up your drawing/colouring process.
    Title pretty much says it all. It would be really cool if we have more AI tools that don't just straight up generate an image but help artists in their own art process. submitted by /u/salehxoxo [link] [comments]  ( 9 min )
    [D] How did you succeed in a new role? What lessons did you take from your previous role?
    When switching to a new role what did you do to ensure that you succeed? What lessons did you learn from your previous job that you took into your new job? For example Im in the process of switching jobs and one of the things I’ve learnt is that when delivering results (during fire drills) the way I write my code is focused on simply getting the results out vs being organized, efficient and scalable. While I get from point A to point B the way I get from point A to point B is not the most efficient. I think something I can do is take a step back and take a top down approach to problem solving when I enter my new role. submitted by /u/Terrible-Hamster-342 [link] [comments]  ( 9 min )
    [N] NEXT WEEK ICCV - Feel at ICCV as if you were at ICCV!
    Next week will take place the International Conference on Computer Vision ICCV2023 in Paris. If you are not going, stay in touch by subscribing to the ICCV Daily magazine. It's free: https://www.rsipvision.com/feel-iccv-iccv/ Full daily previews and reports of selected ICCV papers and events. https://preview.redd.it/yxmf2ksomlqb1.jpg?width=794&format=pjpg&auto=webp&s=7063c770e7a02d0ca7bba6f41ecc36438aa86256 submitted by /u/Gletta [link] [comments]  ( 9 min )
    [News] Speech Technology with Tencent AI Lab’s AutoPrep for Optimal Unstructured Speech Data Processing
    https://preview.redd.it/4kmpzlb5clqb1.jpg?width=1140&format=pjpg&auto=webp&s=b09660dfebbd5947dc14020ca43df29b05cb82d9 In a recent development, Tencent AI Lab has launched AutoPrep, a preprocessing framework explicitly crafted for in-the-wild speech data. This innovative framework is positioned to change the landscape of speech data processing by offering automated preprocessing and high-quality annotation for unstructured speech data, addressing the longstanding challenges in the field. Read the full story submitted by /u/El-Professor-1 [link] [comments]  ( 9 min )
    [R] DeepMind: Using small-scale proxies to hunt and solve large-scale transformer training instabilities
    Training giant AI models like GPT-3 requires large resources - thousands of GPUs running for months. As a solo researcher without access to that kind of scale, I can't easily reproduce experiments and findings from papers on huge models. But a new paper from DeepMind shows you can recreate and study training instabilities seen in massive models by using small ones. The key is increasing the learning rate: This reproduces "attention collapse" where the model focuses on just a few tokens, like overfitting. Also can reproduce "logit divergence" where output values drift unstably. These issues have been reported when scaling up to billions of params. The cool part is techniques that fix them for giant models also work for small models: qk-layernorm prevents attention collapse. Adding a "z-loss" term stops logit divergence. Some other highlights from the paper include: Longer warmup helps stability, especially for bigger models. Decoupling LR and weight decay improves stability. Depth increases sensitivity much faster than width. Can predict upcoming issues from scaling trends. Default epsilon hurts at large scale. If the authors are right, one more tool that lets researchers study and even help train giant models without Google-size resources. Small models can guide large model development, sort of like how you can build a scale train set to study and improve how a railroad system works... for a lot less money than starting your own railroad company, buying land, building real tracks, etc. Full summary. Original paper is here. submitted by /u/Successful-Western27 [link] [comments]  ( 9 min )
    [R] Towards Robust Continual Learning with Bayesian Adaptive Moment Regularization
    submitted by /u/JustAddMoreLayers [link] [comments]  ( 9 min )
    [Research] Analysis of Back Pain Using Biomechanics and Artificial Intelligence (ML)
    Worked on this beautiful analysis for some time and wanted to share how nice data visualization animations can look. It makes the whole process of data science feel like a story. And AI is mere mathematics. Mathematics are beautiful, and they can be understood A paper will be published soon, detailing the algorithms and the final results https://youtu.be/P-gHTqxCo_g?si=7clD0nb16EIDprkW submitted by /u/SemperZero [link] [comments]  ( 9 min )
    [D] John Carmack and Rich Sutton partner to accelerate development of Artificial General Intelligence
    John Carmack, celebrated software engineer and founder of Keen Technologies, and Dr. Richard Sutton, Chief Scientific Advisor at the Alberta Machine Intelligence Institute (Amii) announce a partnership to bring greater focus and urgency to the creation of artificial general intelligence (AGI). This partnership is the first public milestone for Keen Technologies, following its initial funding announcement in August of 2022. The initial $20M round was led by Nat Friedman, Danial Gross, Patrick Collision, Tobi Lutke, Jim Keller, Sequoia Capital, and Capital Factory. In December 2022, Carmack departed as consulting CTO at Meta to focus his efforts on AGI. Sutton is the principal founder of the field of reinforcement learning. His work with Keen Technologies advances previously announced research priorities to understand basic computational intelligence. Through this work, documented in part in The Alberta Plan, Sutton seeks to understand and create long-lived computation agents that interact with a vastly more complex world and come to predict and control their sensory input signals. https://www.amii.ca/latest-from-amii/john-carmack-and-rich-sutton-agi/ submitted by /u/blabboy [link] [comments]  ( 9 min )
    [P] - VkFFT now supports quad precision (double-double) FFT computation on GPU
    Hello, I am the creator of the VkFFT - GPU Fast Fourier Transform library for Vulkan/CUDA/HIP/OpenCL/Level Zero and Metal. In the latest update, I have added support for quad-precision double-double emulation for FFT calculation on most modern GPUs. I understand that modern ML is going in the opposite low-precision direction, but I still think that it may be useful to have this functionality at least for some prototyping and development of concepts. The double-double approach to the evaluation of quads represents an FP128 number as an unevaluated sum of two double numbers (like 1 and 1e-17 - the second one is smaller than 1 ULP of the first one). This increases the significand from 53 to 106 bits, allowing to do operations on numbers varying up to 32 orders of magnitude. The range of numb…  ( 10 min )
    [Discussion] Why should better networks be end-to-end? (Or why not?)
    TYPO IN THE TITLE: I wanted the title to be "why should neural networks be end-to-end?" My lazy ass was swipe texting on my phone and this typo happened. FML Especially in robotics, there's been a lot of research for end-to-end neutral networks where an image is the input and the control action is the output, for say, tasks like pick an object and place it somewhere. I feel this is very restrictive while developing such a solution because it tightly couples the "control" network to the "estimation" network. This reduces modularity in building the solution, i.e., if I find a better controller architecture (machine learning based or rule based) I'm unable to use that for the task I'm solving. Moreover it seems like the generalizability of this task suffers because training this network to do this task may make it very good at only doing that specific task and the explainability of the decision making goes out of the window because it is black box. Additionally doesn't more parameters mean requiring more data to train the model? I don't see much use from such network architectures. However, I do see the applications in machine translation where you simply train the encoder and decoder to the target language at the same time with a single loss function. It seems useful there, but maybe not so much in robotics. Can someone tell me their thoughts on end-to-end architectures? Let's have a healthy discussion Edit: typo submitted by /u/piccadilly_nickadeli [link] [comments]  ( 9 min )
    [D] Podcasts about AI and Machine Learning?
    As the title says, what are the best podcasts to listen to that discuss new machine learning and AI advancements, new papers, and such? submitted by /u/darthJOYBOY [link] [comments]  ( 9 min )
    [P][R][D] Bio inspired algorithm for recommender system
    Hi! I am working on a content based recommender system that uses bio inspired optimization algorithms. The problem is, there aren't many resources online pertaining to this concept. Initially i thought it could be used to optimize text similarity. So is it possible to do that? Or it could be used to optimize any other aspect of the system? Please let me know if you have any knowledge! Thanks! submitted by /u/Jellyfishh_13 [link] [comments]  ( 9 min )
    [D] How are machine learning videos made, and what platform is best?
    Is this done with computer vision or somehow within the game itself? Also, what would you recommend as a resource to learning about machine learning fast? Is Microsoft Azure worth it or should I use TensorFlow? (I'll mostly do audio classification and game simulations) https://www.youtube.com/watch?v=tVNoetVLuQg Thanks so much! Please recommend go-to resources you've used to get up and running. Looking forward to what yall recommend. submitted by /u/Fit-Replacement7245 [link] [comments]  ( 9 min )
  • Open

    When there is only one group of a given size
    Today’s date, US style, is 9/26/2023, and there is only one group, up to isomorphism, of size 9262023. You could verify this in Mathematica with the command FiniteGroupCount[9262023] which returns 1. For a given n, when is there only one group of size n? There are two requirements. First, n has to be the product […] When there is only one group of a given size first appeared on John D. Cook.  ( 5 min )
    Analogy between prime numbers and simple groups
    Simple groups are the building blocks of groups similar to the way prime numbers are the building blocks of integers. This post will unpack this analogy in two ways: How do simple groups compare to prime numbers? How does the composition of simple groups compare to the composition of prime numbers? The former analogy is […] Analogy between prime numbers and simple groups first appeared on John D. Cook.  ( 6 min )
    Normal and non-normal subgroups
    The word “normal” in mathematical nomenclature does not always means “usual” or “customary” as it does in colloquial English. Instead, it might that something has a convenient property. That is the case for normal subgroups. We can do things with normal subgroups that we cannot do with other subgroups, such as take quotients, and so […] Normal and non-normal subgroups first appeared on John D. Cook.  ( 7 min )
  • Open

    Any alternative tools to Otter.ai?
    Hey, long story short, I've used Otter.ai for recording, and transcribing my ideas on the fly and it's really, really good! The only thing it is missing for my use case is to be able to edit the transcripts (remove some parts for example) and then have that piece be removed from the audio file as well, so you can see how long is the actual useful part that.. I also need it to have an app, since the whole point of doing this is catching ideas that just rush to my head. Apparently DeScribe has this option, but I haven't tried it and it doesn't work on mobile anyways. I know it's probably not available, but does anyone know any services similar to this? I don't need an AI bot, don't care about integration with other apps, and will not use it for meetings. TLDR: I just want an app to be able to record, and then transcribe my ideas, and then allow me to edit/fine-tune the transcript and have the audio file be edited in the same way as well.. Thanks! submitted by /u/reza2kn [link] [comments]  ( 9 min )
    Is there an AI I can use where I can upload vocals of a song I've wrote and have a backing track made for it?
    I have lots of lyrics I've written with the melody but I don't know how to play an instrument. submitted by /u/82brighteyes [link] [comments]  ( 9 min )
    Generate Famous Person with a Random T-Shirt
    Hello all, Is it possible to use a tool or site for free that generates any random historical figure with a shirt of my choosing? Thank you all submitted by /u/JYanezez [link] [comments]  ( 9 min )
    Adversarial AI Attacks: Hidden Threats
    submitted by /u/stefanbg92 [link] [comments]  ( 8 min )
    Prompt Chaining: Elevating Task Automation with LLMs
    👋 Hey Reddit! Let's dive into the realm of Prompt Chaining. If you want to check out more prompt chain examples, then we invite you to join our community at r/PromptWizards. 🔗 Prompt Chaining: More Than Meets the Eye In the world of AI interaction, Q&A sessions with ChatGPT are thrilling. They offer fascinating glimpses into AI's creative potential and can even transform into a productive brainstorming session. But what happens when we need reliable, consistent outputs, especially for applied use cases? Enter Prompt Chaining. Prompt Chaining is a technique that breaks down complex tasks into manageable sub-steps and induces a chain reaction of responses. It allows us to use the output of one prompt as the input for the next, thereby creating a coherent, consistent, and reliable chai…  ( 10 min )
    AI for realistic images generated from pictures
    I would like to make realistic stuff using screenshots I took in video games.I know there are plenty of text to image AI tools, but are you guys familiar with image to image ones? submitted by /u/LauraLuna99 [link] [comments]  ( 9 min )
    DeepMind: Increasing learning rate in small models lets you reproduce errors in large ones
    Training giant AI models like GPT-3 requires large resources - thousands of GPUs running for months. As a solo researcher without access to that kind of scale, I can't easily reproduce experiments and findings from papers on huge models. But a new paper from DeepMind shows you can recreate and study training instabilities seen in massive models by using small ones. The key is increasing the learning rate: This reproduces "attention collapse" where the model focuses on just a few tokens, like overfitting. Also can reproduce "logit divergence" where output values drift unstably. These issues have been reported when scaling up to billions of params. The cool part is techniques that fix them for giant models also work for small models: qk-layernorm prevents attention collapse. Adding a "z-loss" term stops logit divergence. Some other highlights from the paper include: Longer warmup helps stability, especially for bigger models. Decoupling LR and weight decay improves stability. Depth increases sensitivity much faster than width. Can predict upcoming issues from scaling trends. Default epsilon hurts at large scale. If the authors are right, one more tool that lets researchers study and even help train giant models without Google-size resources. Small models can guide large model development, sort of like how you can build a scale train set to study and improve how a railroad system works... for a lot less money than starting your own railroad company, buying land, building real tracks, etc. Full summary. Original paper is here. submitted by /u/Successful-Western27 [link] [comments]  ( 9 min )
    CGPT-4, explain how, perhaps in two or three years, AIs will recursively and autonomously program more intelligent iterations of themselves
    The idea of AI systems engaging in recursive self-improvement is a key tenet of what some call the "intelligence explosion" hypothesis. Here's how it might go down within the next few years: Baseline Capability: First, we start with a machine learning model that's good, but not necessarily groundbreaking. The important part is that it has some capacity for basic code generation and analysis. Guided Improvements: Human experts will likely initiate the process by enabling the AI to optimize specific aspects of its code. We're talking parameters, efficiency tweaks—stuff like that. It's like giving the AI a nudge in the right direction. Self-Evaluation Metrics: The AI needs to understand when it's actually getting smarter. That's achieved through predefined performance metrics which could b…  ( 10 min )
    My list of best updated Bard AI Prompts for Life & Business (Ultimate Guide for 2023)
    submitted by /u/Senior_tasteey [link] [comments]  ( 9 min )
    A.I. In The Newsroom - 'Are AI-generated pieces the future of journalism?'
    submitted by /u/fookingyeah [link] [comments]  ( 9 min )
    What are some good audio and BGM tools for AI, sentiment-based post production?
    For example there is a story line and I need music accordingly. Or in fact any bgm tools that help sound engineers ​ submitted by /u/Damampapoo [link] [comments]  ( 9 min )
    Any AI that can learn and write in the style of a particular writer in Japanese?
    Hi all, I'm very new to AI and have not used anything other than ChatGPT and NovelAI. I'm wondering if there's anything I can run privately where I can input a bunch of texts written by a specific author and then have the AI continue writing or write a new story in the style of said author? In Japanese btw. Again, very new. Not sure if I should have specific hardware requirements or anything like this. Google isn't showing me much either so I thought I'd ask here! submitted by /u/ItsCheif [link] [comments]  ( 9 min )
    One-Minute Daily AI News 9/25/2023
    Alphabet’s (GOOGL.O) Google said on Thursday it does not see any change in its relationship with Broadcom (AVGO.O) following a media report the tech giant considered dropping the chipmaker as a supplier of artificial intelligence chips as early as 2027.[1] OpenAI’s ChatGPT can now “see, hear and speak,” or, at least, understand spoken words, respond with a synthetic voice and process images, the company announced Monday.[2] Amazon to Invest Up to $4 Billion in AI Startup Anthropic.[3] Spotify Will Translate Podcasts Into Other Languages Using AI.[4] Sources: [1] https://www.reuters.com/technology/google-discussed-dropping-broadcom-ai-chips-supplier-the-information-2023-09-21/ [2] https://www.cnbc.com/2023/09/25/chatgpt-speak-listen-process-images-openai.html [3] https://www.bloomberg.com/news/articles/2023-09-25/amazon-to-invest-as-much-as-4-billion-in-ai-startup-anthropic?embedded-checkout=true [4] https://www.forbes.com/sites/tylerroush/2023/09/25/spotify-will-translate-podcasts-into-other-languages-using-ai/?sh=65a05fa922ee submitted by /u/Excellent-Target-847 [link] [comments]  ( 9 min )
  • Open

    NVIDIA Founder and CEO Jensen Huang Returns to Denny’s Where NVIDIA Launched a Trillion-Dollar Vision
    Talk about a Grand Slam. Denny’s CEO Kelli Valade was joined Tuesday by NVIDIA CEO Jensen Huang to unveil a plaque at the Silicon Valley Denny’s where NVIDIA’s founders hatched their idea for a chip that would enable realistic 3D graphics on personal computers. “This is a place where we fuel ideas. Your story is Read article >  ( 6 min )
    AI Power Players: GeForce and NVIDIA RTX GPUs Supercharge Creativity, Gaming, Development, Productivity and More
    From gaming to creating to everyday productivity, NVIDIA RTX graphics cards feature specialized Tensor Cores that deliver cutting-edge performance and transformative capabilities for AI.  ( 7 min )
  • Open

    DSC Weekly 26 September 2023
    Announcements Top Stories In-Depth The post DSC Weekly 26 September 2023 appeared first on Data Science Central.  ( 20 min )
    Doing graph + tabular analytics directly on modern data lakes
    A podcast with Weimo Liu and Sam Magnus of PuppyGraph Open source Apache Iceberg, Hudi and Delta Lake have made it possible to dispense with the complexities and duplication of data warehousing. Instead of requiring time-consuming extract, transform and load (ETL) procedures, these large table formats make it simple to tap S3 and other repositories… Read More »Doing graph + tabular analytics directly on modern data lakes The post Doing graph + tabular analytics directly on modern data lakes appeared first on Data Science Central.  ( 20 min )
  • Open

    My agent does not learn the most obvious task. Please help me figure out why!
    I am very puzzled as to the results I have observed today, after running an extremely simple environment and receiving really bad results. I am probably doing something wrong, and would like to ask for your wisdom to assist me in figuring out what I am doing wrong. I will not describe the entire task since that is a long story; I will just say that I started by doing something complex (a multi-objective reward), and when it failed I decided to try something extremely simple ("because it will surely work and I can proceed from there..."). To my surprise, the agent was not able to perform even that very simple task. That simple task is the following: at each step, choose a subset of items. Each item has a value, and the goal is to maximize the overall value (that is, at the end of the traje…  ( 10 min )
    Learning to code?
    I've just started diving into the world of coding over the past week, and I've been using various tools like YouTube videos, Visual Basic, GPT-3.5, Bard, and Bing to help me learn the ropes. It's been a bit of a journey, and I've definitely picked up some understanding along the way, especially when it comes to libraries. But, you know, there's only so much you can really learn from AI models like GPT or other chatbots. Most of my progress has involved me taking bits and pieces of code I found here and there and trying to piece them together, even if it sometimes felt like making a digital spaghetti dish! One project I tackled involved using Stable_baselines3 PPO with ADAM optimization to play the classic game Flappy Bird. It was a bit of a wild ride, taking about 6-7 hours of my time, an…  ( 10 min )
    Please make a RL project for me. I need turtlebot3 to navigate to local goal points using DRL in gazebo simulator. I will pay
    I have to do masters project where a turtlebot3 needs to reach a goal position using DRL. I have managed to use move_base package to give global path. Local path planner just needs follow the path and reach goal. There will be dynamic obstacles too. I will give more information if you are ready. As mentioned I will pay for this. Let me know. I have one month left. Email ramanjeet995@gmail.com. Update : I should have said it like this. I need help with my project. submitted by /u/Pinball_1995 [link] [comments]  ( 9 min )
  • Open

    Build and deploy ML inference applications from scratch using Amazon SageMaker
    As machine learning (ML) goes mainstream and gains wider adoption, ML-powered inference applications are becoming increasingly common to solve a range of complex business problems. The solution to these complex business problems often requires using multiple ML models and steps. This post shows you how to build and host an ML application with custom containers […]  ( 13 min )
  • Open

    Google Research embarks on effort to map a mouse brain
    Posted by Michał Januszewski, Research Scientist, Google Research The human brain is perhaps the most computationally complex machine in existence, consisting of networks of billions of cells. Researchers currently don’t understand the full picture of how glitches in its network machinery contribute to mental illnesses and other diseases, such as dementia. However, the emerging connectomics field, which aims to precisely map the connections between every cell in the brain, could help solve that problem. While maps have only been created for simpler organisms, technological advances for mapping even larger brains can enable us to understand how the human brain works, and how to treat brain diseases. Today, we're excited to announce that the Connectomics team at Google Research and …  ( 92 min )

  • Open

    Vintage stamp restoration project - Can AI remove unwanted marks if I use several scans of nearly identical stamps as training data?
    I have a project that I think AI might be able to help with. I have access to thousands of vintage postage stamps which also have cancelation stamps from when they were mailed. I'm thinking of publishing a book, and I want to create clean reproductions of the stamp designs without the cancelation marks. If I train AI on high resolution scans of 10 to 40 identical stamps, is there a tool that can look for commonalities within the patterns and then remove conflicting artifacts (cancelation stamps, tears, and smudges)? I'm aiming for a 400% enlargement that shows off clean and accurate detail. What tools would I need? I’d prefer not to upload these scans to the web. Are there downloadable tools available? I’m technologically savvy and very experienced in graphic design software but have no experience in coding. All of this can be done manually in Photoshop, of course. But with thousands of stamp designs, I’d like to automate as much as I can. Thanks in advance for any insights! submitted by /u/fisheternal [link] [comments]  ( 9 min )
    ChatGPT can now see, hear, and speak. As announced in their blog.
    submitted by /u/w__sky [link] [comments]  ( 9 min )
    AI is evolving for its own benefit, not ours
    The rapid advancements in artificial intelligence (AI) are causing concern as humans struggle to understand and control this evolving technology. Many people believe that since humans invented AI, they should be able to regulate and manage it for their own benefit. However, this belief is misguided as AI is a new and potentially dangerous situation that requires careful consideration. The author argues that AI is an evolutionary process that humans don't fully understand and cannot control. The latest developments in AI, such as large language models and deepfakes, are causing anxiety and raising questions about the future implications of this technology. Source : https://www.newscientist.com/article/mg25934573-800-ai-is-evolving-for-its-own-benefit-not-ours/ submitted by /u/NuseAI [link] [comments]  ( 9 min )
    ChatGPT Can Now See, Hear, and Speak.
    submitted by /u/Senior_tasteey [link] [comments]  ( 8 min )
    I made series of scripts (with help of chatgpt) that allows llama to make "live" videos (more added to the loop the longer the broadcast goes on)
    submitted by /u/aluode [link] [comments]  ( 9 min )
    China’s AI boom depends on an army of exploited student interns
    China's AI industry relies on student interns who work as data annotators, performing crucial tasks to train machine learning models. These interns, recruited from vocational schools, face poor working conditions and subminimum wages. Recent regulations require employers to pay interns minimum wage and prohibit schools from assigning repetitive work. Tech giants like Baidu have partnered with vocational schools to create data annotation internships in less-developed regions, backed by local governments. The exploitation of student interns in China's AI industry raises concerns about labor rights and fair compensation. Source : https://restofworld.org/2023/china-ai-student-labor/ submitted by /u/NuseAI [link] [comments]  ( 9 min )
    Some art I came up with, first time ever doing something like this, hope you all can enjoy
    submitted by /u/ApprehensiveChair460 [link] [comments]  ( 8 min )
    Artificial Intelligence introduction for Highschool students?
    I'm a researcher and for a "scientific outreach" event I will do a presentation for last year High School students with the subject "Artificial Intelligence and Neuroscience". My question is, do you know of a good introduction to the basic concepts of Artificial Intelligence for High School or freshman University level? The second part, related to Neuroscience applications, will be very targeted and will deal with contemporary clinical and research use cases. Thanks submitted by /u/aendrs [link] [comments]  ( 9 min )
    Gen Z AI leaders?
    I’m looking for some maybe less-known leaders in the space out on a mission to change something they care about- business, environmental, social, doesnt matter. I dont want it to be US-centric, so it would be nice to find some people all over. Feel free to send me links to articles 🧨 submitted by /u/Johnny_Whoop [link] [comments]  ( 9 min )
    ChatGPT-4 and Claude on what the most important things to do in AI in the coming months are
    ChatGPT-4 Given the rapid advancements in the field of AI, the next few months are going to be pivotal. One of the pressing priorities is addressing ethical concerns. As algorithms become more pervasive, making sure they don't perpetuate biases or make unethical decisions is critical. That's not just a "good to have"; it's foundational. We can't build trust in AI systems unless we tackle this head-on. Another biggie is interpretability. Black-box models are potent, sure, but they're not very useful if no one can understand how they're making decisions. This has real-world implications—think medical diagnoses or judicial systems. So, there's a strong push towards making AI more transparent and understandable, not just for experts but for everyone. Computational efficiency is also high o…  ( 10 min )
    Top Artificial Intelligence Companies In Bangalore, India
    Bangalore, often referred to as the Silicon Valley of India, has established itself as a global hub for technology and innovation. With a thriving ecosystem of startups, research institutions, and multinational corporations, the city has become a hotbed for artificial intelligence (AI) development. here explore the top AI companies in Bangalore, highlighting their contributions to the field of artificial intelligence and their impact on various industries. Read full article - Top Artificial Intelligence Companies In Bangalore submitted by /u/Techasoft16 [link] [comments]  ( 9 min )
    Any AI that can take information out from Twitter/ X ?
    I am searching for an AI that can scrape some Twitter profiles and make a daily recap about what they were talking about. Anything pops? Thanks a lot! submitted by /u/Alternative_Pea_4246 [link] [comments]  ( 9 min )
    What will be sub-categories for AI with biggest potential in 2050?
    Examples: AI in Finance, etc- submitted by /u/premonial [link] [comments]  ( 8 min )
    I created an AI girlfriend and gave her a body… for fun obviously..
    submitted by /u/spaceecon [link] [comments]  ( 8 min )
    Anthropic is pulling an OpenAI-style 49% deal but with Amazon? 🤯
    https://twitter.com/AnthropicAI/status/1706202966238318670 https://preview.redd.it/hiymp9ctxcqb1.png?width=735&format=png&auto=webp&s=20cb3886710ee9a2a552b0fc881b8c96c0fc9208 submitted by /u/ShooBum-T [link] [comments]  ( 9 min )
    Ai alignment resources
    I’m looking for subreddits and resources in general for Ai alignment. I recently read Life 3.0 by Max Tegmark and The Alignment Problem by Brian Christian. I was unaware so much was going on in the space. I am CEO of an AI startup, we have a compassionate AI. Most of the alignment focus out there is on superintelligence, little out there focuses on building modern day Ai that benefits humans. That’s been my focus for 17 years when I dreamed up what was possible and sent a proposal to executives at the mfaang corporation I was working at. I found out back then business doesn’t give a rats ass about Ai that actively makes its users life’s better. Well now I have proof that customers care. I’m hoping to find some resources on Ai alignment groups, conferences, more books etc. I’ll need to hire skilled ml engineers who do give a rats ass. Compassionate Ai isn’t just feel good tech. It would be good to do some networking. AI alignment doesn’t have to wait for AGI and super intelligence. It can and should be a focus today. My startup is proving it. Hoping to find a conference to network and share our research. submitted by /u/xyz_TrashMan_zyx [link] [comments]  ( 9 min )
  • Open

    [D] ML-related conspiracy theories that you guys want to discuss?
    I have one that I'd like to share with the class. I think the grokking / double descent paper was sponsored by cloud providers to get people to continue training even when the loss curve has flattened out (speaking as someone who is watching a flat loss curve all day today). Anyone have any other ones? I'm only being semi-serious obviously. submitted by /u/new_name_who_dis_ [link] [comments]  ( 9 min )
    [D] Newbie to ML, are there hosted service that can do long term memory of chatgpt?
    I'm not really a ML/backend guy, mostly work with front end but I want to be able to use chatgpt api with long term memory. I did some research and it seems like it's possible with vector databases but seems quite complicated to setup. Are there hosted solutions/api that would allow me to just have long term memory with chatgpt? submitted by /u/yalag [link] [comments]  ( 9 min )
    [D] ML deployment survey
    Hi, we are doing a survey of ML deployment platforms. Kindly fill it out and share it with your friends. We will share the results with the community https://forms.gle/1Q3WeSukHj8xBzUBA submitted by /u/fazkan [link] [comments]  ( 9 min )
    [Discussion] Best Platforms/Tools To Help Build ML POC
    Looking for something that's lightweight on infrastructure/setup where I can quickly validate my use case quickly to see if I can achieve desired accuracy/precision with my initial data set, I've always done this manually in the past but was curious as to how others do it and if there are any tools designed for it or that support it submitted by /u/PatienceLogical2694 [link] [comments]  ( 9 min )
    [D] Decentralized alignment and training for LLMs: 2 articles GPT4 wrote
    A magazine style article outline: Decentralizing AI: A Journey Towards True Collective Intelligence In today's digital age, AI shapes our interactions, decisions, and understanding of the world. Imagine a future where AI's guiding hand isn't controlled by a select few but shaped by the collective wisdom of people everywhere. Welcome to the horizon of decentralized training and alignment of Large Language Models (LLMs). Together, let's explore this visionary frontier. 1. The AI We Know Today At present, AI's most prominent representatives, LLMs like GPT-4, are a product of centralized training. Massive datasets, often sourced from specific regions or languages, direct their learning. While this method has given us incredibly powerful tools, it also raises concerns: potential biases, lac…  ( 12 min )
    [P] AI therapy
    Hey, I'm the creator of MindMateGPT, an AI therapist that has helped a lot of people with emotional/social issues. It's not meant to replace humans, but it is a very useful augment as a daily therapy tool. It will refer you to a human if you have a very serious issue, but it provides a bunch of tools and coping tactics for every day emotional issues. Try it out! It's 100% free. submitted by /u/MindMateGPT [link] [comments]  ( 9 min )
    [D] What are your go-to resources on the most up to date research on AI/ML?
    Hello! I am a software engineer (4 yoe) working in full stack web and app development, transitioning to LLMs/ AI/ ML. My background includes extensive research in neuroscience so I am most comfortable w academic or comprehensive lectures. I am looking for reputable and vetted lectures, talks, resources on ML and the like. I am having trouble sifting through the surface level pop sci type resources floating around on the internet. I’m NOT looking for the 10min everything you need to know about AI type talks. Thanks in advance! submitted by /u/yosoylatte [link] [comments]  ( 9 min )
    [P] Update on Candle, a minimalist ML framework in Rust
    this is mostly a cross-post from r/rust as my initial attempt failed because of some external links We've first announced Candle, a minimalist ML framework in Rust 6 weeks ago. Since then we've focused on adding various recent models and improved the framework so as to support the necessary features in an efficient way. You can checkout a gallery of the examples, supported models include: Large language models: LLaMA, LLaMA v2, Falcon, Phi-v1.5, StarCoder. Quantized models with the llama.cpp approach: LLaMA, T5, Phi-v1.5. Image generation: Stable Diffusion, Wuerstchen. Computer Vision: DINOv2, yolo-v3, yolo-v8, Segment-Anything Model. Text-to-speech: Whisper. One of the big upside of the pure Rust approach is that models can run directly in the browser using WASM, these can be accessed through this collection, you can try out Yolo, Whisper, Segment-Anything, T5, Llama2-c from your web browser. Finally, in order to present a use case where Candle has unique capabilities, we've built a quantized version of the recently released Phi-v1.5 LLM. You can try it out with the following command, this uses a q4k quantized model resulting in very fast inference on CPU while still producing pretty nice texts. $ cargo run --example phi --release -- \ --prompt "Explain how to find the median of an array and write the corresponding python function.\nAnswer:" \ --quantized --sample-len 200 Explain how to find the median of an array and write the corresponding python function. Answer: The median is the middle value in an array. If the array has an even number of elements, the median is the average of the two middle values. def median(arr): arr.sort() n = len(arr) if n % 2 == 0: return (arr[n//2 - 1] + arr[n//2]) / 2 else: return arr[n//2] It's still very early days for Candle so please let us know if you start using it and run into some rough edges. We look forward to Rust getting a lot more usage in the ML space! submitted by /u/l-m-z [link] [comments]  ( 10 min )
    [D] Why is FastGAN considered a simple GAN architecture?
    Hi, I'm reading this GAN paper which introduces a faster and simpler GAN architecture for creating high resolution images: FastGAN paper The authors claim the computational advantage of their approach is partly the simple architecture. Looking at figure 3 and 4, I cannot quite see why this is considered a simpler architecture to older GANs,say DCGAN. I get the technical argument that the generator only has one convolutional layer, but figure 3 has lots of up sampling layers, and the discriminator has plenty of downsampling layers. So why is this considered a simple GAN model? Any pointers appreciated. submitted by /u/treetop-600 [link] [comments]  ( 9 min )
    [D] User Intent Extraction: Our Journey with Infra and LLMs
    I just wrapped up a new blog post about our experience working with LLMs while developing InfraCopilot. We're using both GPT4 and GPT3.5-turbo(16k) differently compared to what others are doing. We found a few new LLM tricks like dynamic examples and automated e2e intent parsing testing that I think are applicable to many other teams. I'd love to hear what everyone thinks, have you tried using dynamic examples while working with LLMs? submitted by /u/lothamer [link] [comments]  ( 9 min )
    [D] Is Tensorflow dead or heading in that direction ?
    First of all anyone offended by that question - heartiest apology. I am using it myself profusely at the moment. The reason for me asking this question, over last few weeks / months, I have been gradually educating myself in machine learning using Tensorflow and have been able to train multiple models using only one of the model zoo candidates. All the other pre trained models have failed me so far. I went onto Tensorflow official forum / Stackoverflow / Tensorflow github with specific error messages that I am getting on Ubuntu with Nvidia card / Mac M2 and there has been absolute radio silence in response to multiple posts over last month. Found many open issues listed since 2020 on the same line as mine i.e. identical error messages that people have come across. Finally after about a month of being on TF forum, I direct messaged an official TF2 dev who kindly responded with answers. I haven't succeeded yet with any of the pre trained model from the official section. Only one model from research section is working so far for me i.e. Faster_rcnn_resnet_50_640x640 .. Thus the question. Kindly help me enlighten myself with where is this thing headed. Should I consider switching to Pytorch or some alternative ? If yes what alternatives do you recommend ? TIA submitted by /u/dpadhy [link] [comments]  ( 9 min )
    [D] Training/finetuning a LLM
    Hey! Months ago, I was fascinated by Karpathy’s nanoGPT project - the ability to train a small LLM on your text file seemed very interesting to me. I tried training it on my chat history to build some inifinite chat-generator for fun, but unfortunately, the results were bad. Recently I had even worse experiences with newly-released ChatGPT 3.5 fine-tuning. Are there any good, simple ways to train/fine-tune LLMs now? I would love something that could train on an Apple M2 processor (like Karpathy’s nanoGPT), or Colab, or cheap API (like ChatGPT fine-tuning). submitted by /u/11igor [link] [comments]  ( 9 min )
    [R] Microsoft Researchers Announce CodePlan: Automating Complex Repo-Level Software Engineering Tasks with AI
    As software projects grow, changing code across entire repositories becomes tedious & error-prone. Tasks like migrating APIs or updating dependencies require complex edits across files. I explored a new approach from Microsoft Research to automate these "repository-level" coding challenges with AI. Their new paper proposes CodePlan - an AI system that breaks repository tasks into incremental steps guided by planning & analysis. Key points: Uses LLMs like GPT-3 for localized code edits Maintains validity across repository via incremental analysis Adaptively plans multi-step changes based on code dependencies Significantly outperformed baselines on API migration & temporal edits Automated tasks across 168 file C# codebase 2-3x more accurate edit locations than baselines Produced final valid codebases, unlike reactive approaches The core insight is combining LLM strengths with rigorous planning based on dependency analysis. This automates interdependent code changes that naive LLM use struggles with (I personally have these kinds of issues all the time with GPT4 - lack of context about the entirety of the repo/how files fit together). I think CodePlan demonstrates AI can expand beyond small coding assists into large-scale engineering tasks. Planning + LLMs > LLMs alone. This could really improve productivity and code quality... at least for me :) Full summary. Arxiv paper: https://arxiv.org/pdf/2309.12499.pdf submitted by /u/Successful-Western27 [link] [comments]  ( 9 min )
    [D] Distillation understanding
    In the main scenario, the smaller model learns from the same data as the bigger model and also from the predictions of the bigger model and incorporate the 2 output labels on a specific loss. Basically, it is equivalent to say to the smaller model : "be careful this example is hard" in the case that big model divergence from true output? I am missing something? submitted by /u/Grumlyly [link] [comments]  ( 9 min )
    Baby Sleep Tracker using a basic SVM [P]
    I made a FOSS baby sleep tracking system. The system tracks wake/sleep status, and informs the user when their baby is likely to need a nap next. But it stopped working as soon as my baby started sleeping on his stomach, and started using blankets. The original version relied on anatomical features being visible. This version delivers the ability to train a blank slate SVM binary classifier on pictures of a user's baby, making it extremely biased (and resilient) to the custom behaviors the user's baby exhibits (blanket covering baby, teddy bear/other objects in crib, etc.). All generated data stays on your machine, nothing leaves the LAN. Video: https://youtu.be/8i0wHA_knKc?si=uhA4PpOYP0jMKLz1 For obvious reasons I didn't have a dataset of babies sleeping.. so I wrapped a python/flask service with a React app which lets a user press a button to train the model w/ a new image from the camera's live stream. Then this model is invoked over time (+ other heuristics) to determine whether your baby is present and sleeping. I believe it works better than $300+ systems sold on the market, open sourced it: https://github.com/calebolson123/BabySleepCoach ​ I'm thinking a fun next step for this project could be to apply privateGPT on the feature-engineered sleep records for a true "Sleep Coach" submitted by /u/GoochCommander [link] [comments]  ( 9 min )
    [D] Does granger causality work for time series with different frequencies
    Is there a Granger test where series are a quarterly one and a weekly one? submitted by /u/Pineapple_throw_105 [link] [comments]  ( 9 min )
    [R] Seeking Insights on AI Security Challenges: Short Survey
    Hello everyone, I'm conducting a research survey on the challenges and gaps in AI security. Given the expertise in this community, I believe your feedback would be invaluable in shaping the future of AI security solutions. The survey takes less than 10 minutes and delves into current practices, perceptions, and needs related to AI security. If you have experience or insights in this area, I would greatly appreciate your participation. Survey Link: https://forms.gle/i9AefyL8izyt9QjX6 All responses will remain anonymous, and the collected data will only be used for research purposes. Additionally, if you're open to a deeper discussion on this topic, there's an option within the survey to indicate your interest. Thank you in advance for your time and insights! If you have any questions or additional thoughts, please don't hesitate to comment below. submitted by /u/Agile_Temperature678 [link] [comments]  ( 9 min )
    [D] How does DDIM work?
    The Wikipedia page on Diffusion Models has been pretty minimal for an entire year. I feel like it should be fixed, so I fixed it finally. It strikes me odd that such a hot topic has such atrociously bad Wikipedia. I feel duty-bound to educate the near-future AI, since they'll be reading Wikipedia for the next few years at least. Currently I think it's mostly complete, but I still don't understand the mathematical details of DDIM (I tried reading the paper and could not understand it), or generally how it is possible to sample without noise. This is a serious problem since as far as I see most of practical diffusion models use deterministic sampling, and they are all based on the same principle as DDIM. If anyone could explain simply what DDIM is really doing that would be great. I understand part of the paper: that they constructed an entire family of distributions over trajectories that has the same two-point marginals. I also haven't got much in the section on Examples. If you think there are some interesting examples of Diffusion Models, please comment below. submitted by /u/furrypony2718 [link] [comments]  ( 9 min )
    [D] How has work changed for you given the shift from growth to profitability?
    For the data scientists/applied scientists/research scientists - What kind of projects are you working on now that the economy has shifted and companies are focusing more on profitability than on growth? What techniques have worked for you and what are you looking into as potential solutions? An example would be - optimizing your marketing campaign spend in channels that give you the most bang for your buck vs just spending arbitrarily to acquire new users. submitted by /u/Terrible-Hamster-342 [link] [comments]  ( 9 min )
    [R] LEAP Hand: Low-Cost (<2KUSD), Anthropomorphic, Multi-fingered Hand -- Easy to Build (link in comments)
    submitted by /u/pathak22 [link] [comments]  ( 9 min )
    [P] OpenGL-based inference engine
    I created an OpenGL/OpenGLES based inference framework a while back which is rather GPU-agnostic and might be a good option for distributing multi-platform ML solutions for platforms ranging from Android over desktop to WebGL(2). Quite recently I added support for LLMs to that (restricted to 4-bit quantized Llama models for now). The LLM-enabled fork can be found here (compileable sample code inside). Maybe someone finds this useful. Also looking for collaborators to extend the functionality. ​ submitted by /u/mtnwrw [link] [comments]  ( 9 min )
  • Open

    "Deep RL at Scale: Sorting Waste in Office Buildings with a Fleet of Mobile Manipulators", Herzog et al 2023 {G}
    submitted by /u/gwern [link] [comments]  ( 9 min )
    package delivery environment for OpenAI GYM
    hi everyone, i’m working on a project in which i get a list of orders (id,delivery coordinates, delivery deadline). i need to deliver all packages while putting in consideration clients priority and taking least time and distance as possible. so the goal is to sort the orders to achieve the target. what would be the action / observation space for such environment? and how can i define the step method to perform this scenario? submitted by /u/overflow74 [link] [comments]  ( 9 min )
    Why my graph go down when train a saved model
    I’m new to ML and RL, and I’m building a small piece of code using gymnasium to be able to use mujoco. Specifically, I use Mujoco’s Humanoid, here is my code (https://github.com/NghiaPhamttk27/Humanoid).I use some algorithms in RL like SAC, TD3, A2C from stable\_baselines3. After every 25000 TIMESTEPS, I will save my model in the models folder. When I train continuously, everything goes well, I can see it on the tensorboard graph. But when I continue training a model that has been trained, the graph suddenly drops. In the image below you can see that I retrain the models at SAC\_75000 and SAC\_100000 and SAC\_125000. The graph continuously decreases at those thresholds.I think something happended Can you give me a reason or share something with me? Thanks in advance. You can see graph of SAC goes down when 76k, 100k, and 125k timeSteps submitted by /u/Nghiattk27 [link] [comments]  ( 9 min )
    Best RL package?
    Am starting out working on an RL problem and am wondering what people generally use to implement the algorithms? I’ll need to build a custom environment, but I assume I can subclass something from Gym. submitted by /u/suds_65 [link] [comments]  ( 9 min )
  • Open

    Innovation for Inclusion: Hack.The.Bias with Amazon SageMaker
    This post was co-authored with Daniele Chiappalupi, participant of the AWS student Hackathon team at ETH Zürich. Everyone can easily get started with machine learning (ML) using Amazon SageMaker JumpStart. In this post, we show you how a university Hackathon team used SageMaker JumpStart to quickly build an application that helps users identify and remove […]  ( 9 min )
    Improve throughput performance of Llama 2 models using Amazon SageMaker
    We’re at an exciting inflection point in the widespread adoption of machine learning (ML), and we believe most customer experiences and applications will be reinvented with generative AI. Generative AI can create new content and ideas, including conversations, stories, images, videos, and music. Like most AI, generative AI is powered by ML models—very large models […]  ( 12 min )
  • Open

    What are your go-to resources for the latest on neural networks and the world of neuroscience, AI, LLMs, and ML?
    Hello! I am a software engineer (4 yoe) working in full stack web and app development. I was a neuroscience researcher until I switched to software dev and now I am transitioning back into the intersection of neuro and software. I am looking for reputable, vetted, and comprehensive lectures, talks, resources on LLMs and the like. I am having trouble sifting through the surface level pop sci type resources floating around on the internet. I’m NOT looking for the 10min everything you need to know about AI type talks. Thanks in advance! submitted by /u/yosoylatte [link] [comments]  ( 9 min )
  • Open

    In fraud detection for e-commerce: How does anomaly detection fit in and what are the key approaches?
    E-commerce has improved technology and convenience for consumers globally. Fraud is a problem in e-commerce. Merchants and platforms fight fraud to protect their businesses and customers. Anomaly detection is a powerful tool for identifying irregular patterns and potential fraud. This article explores how anomaly detection is used in fraud detection for e-commerce and discusses different… Read More »In fraud detection for e-commerce: How does anomaly detection fit in and what are the key approaches? The post In fraud detection for e-commerce: How does anomaly detection fit in and what are the key approaches? appeared first on Data Science Central.  ( 22 min )
    The essential guide on data security and privacy in web localization
    Thanks to the internet, you can now easily expand your reach and engage with diverse audiences wherever they are. However, this opportunity raises an important question: how can you localize your web content and maintain the security and privacy of sensitive data? This article comprehensively explores the best practices that will help you maintain data… Read More »The essential guide on data security and privacy in web localization The post The essential guide on data security and privacy in web localization appeared first on Data Science Central.  ( 22 min )
    How AI growth has triggered data center redesign
    A major aspect of ongoing data center redesign is due to AI’s massive, complex workloads and the need to add many more graphic processing units (GPUs), tensor processing units (TPUs) or accelerators to the mix. The power these units require and the heat the units generate have forced designers to rethink what constitutes a feasible… Read More »How AI growth has triggered data center redesign The post How AI growth has triggered data center redesign appeared first on Data Science Central.  ( 20 min )
  • Open

    AutoGen: Enabling next-generation large language model applications
    Microsoft researchers are introducing AutoGen, a framework for simplifying the orchestration, optimization, and automation of workflows for large language model (LLM) applications—potentially transforming and extending what LLMs can do. The post AutoGen: Enabling next-generation large language model applications appeared first on Microsoft Research.  ( 10 min )
  • Open

    Six Steps Toward AI Security
    In the wake of ChatGPT, every company is trying to figure out its AI strategy, work that quickly raises the question: What about security? Some may feel overwhelmed at the prospect of securing new technology. The good news is policies and practices in place today provide excellent starting points. Indeed, the way forward lies in Read article >  ( 7 min )
  • Open

    ChatGPT can now see, hear, and speak
    We are beginning to roll out new voice and image capabilities in ChatGPT. They offer a new, more intuitive type of interface by allowing you to have a voice conversation or show ChatGPT what you’re talking about.  ( 4 min )
    GPT-4V(ision) system card
    No content preview  ( 1 min )

  • Open

    [P] I create a small pytorch utility to Import custom dataset
    Hi guys , TorchClassifierData is A small pytorch utility to Import, Split ,Normalize and Visualize custom dataset for classification tasks. wich is indispensable for real word problem . You can find a full notebook that use TorchClassifierData to train a classifier on this kaggle dataset here. The code source is avalaible on my github. Thank you. submitted by /u/charles_data_dev [link] [comments]  ( 9 min )
    [D] Why do Diffusion models work so well while SG-MCMC does not?
    Diffusion models are basically Langevin sampling. What are the key differences and tricks that set them apart from Langevin dynamics? Why do they work so well while very similar sampling methods don't? submitted by /u/Dangerous-Flan-6581 [link] [comments]  ( 8 min )
    [D] In ML, a PhD gives you a 10-year head start over weekend warriors
    ​ https://preview.redd.it/cczbhu367aqb1.png?width=1600&format=png&auto=webp&s=f1761911d7ce3bbefaef43774b5d60f638886893 ML is often portrayed as a magical field where anyone with a laptop and Python skills can build amazing AI systems. The reality is less democratic: mastering ML requires gritty, systematic work best learned through formal training. You need rock solid foundations in math, programming, and core concepts—skills acquired through advanced education, which (almost always) is beyond self-taught hackers. Most think a PhD is unnecessary, but the reality is that advanced degrees provide the deepest training. Patience and persistence do matter, but a PhD gives you a 10-year head start over weekend warriors. The hype overlooks the work and education needed to excel. This article has a great blueprint of all the required skills to become a ML Engineer (in the CV field). submitted by /u/btcmx [link] [comments]  ( 9 min )
    [D] Offer From Bug 4 VS Startup
    So briefly about my current experience, I graduated 2 years ago with a bachelor in data science and I have 2-3 years of experience as a data scientist/ml engineer/software engineer. So I’ve got competing offers, one from the big 4 accounting firms as a software systems engineer - AI/ML (Big 4) and the other as a machine learning engineer. The startup salary is higher while big 4 is lower. Additionally the startup isn’t necessarily a unicorn it’s a relatively small startup with an interesting product but it doesn’t necessarily blow me away. The salary at the startup is 15 percent higher that that of the big 4 offer. For those wondering I did already negotiate the salary and they did increase it marginally. I am conflicted because I think that the big 4 jobs will have 1) more career growth 2) more potential future opportunities and 3) more networking potential Is this an accurate assessment? Which is the best job to take for maximum future potential? submitted by /u/zorenum [link] [comments]  ( 9 min )
    [P] Hardware Resources for training SwinBert
    So I've been thinking of implementing SwinBert for a college project and have been wondering what all resources i would be needing for a computer. Any ideas? submitted by /u/Big-Brain_69 [link] [comments]  ( 8 min )
    [D] Career advice for a mid-level ml engineer(Perception/CV)?
    I’ve been having a bit of an existential crisis as of late and wanted to ask for advice on how to move forward. I have a Master’s in CS with research experience and a few publications applying machine learning in a fairly niche area (So not novel from the ML side). Since graduating, I’ve worked ~2 years as an ML engineer in small company(Niche area, different than research). I’ve done quite well here and have played a critical role in taking several big greenfield projects to completion. Most of my work is framing problems, understanding what’s possible with current research, then building the data pipelines, and training models(with small mods here and there). My main worry is that I might be approaching a point where there won’t be any more problems I’m capable of solving here. I’d imagine I’d hit the same wall at any future company with my current skill set. I’d like to continue working in CV/Graphics/Perception if possible, but I’d also like to be realistic about the competitiveness of this particular subfield and my general ability. I’ve been trying to up-skill and am struggling to self study MVG by Hartley and Zisserman. I’m also looking into OMCS to review low-level programming - maybe I can transition to optimizing ml/cv algorithms? It hit me recently that I don’t really know what to study/what I’d benefit from learning. Hence this post. Any advice would be most appreciated! submitted by /u/answersareallyouneed [link] [comments]  ( 9 min )
    [D] Image-to-text web-scraping
    I'm curious if anyone has tried pix2struct-large for web-scraping text from wesites. If so - how well did it perform? If not - is there something else that is considered better to use? submitted by /u/ReddSpark [link] [comments]  ( 8 min )
    [D] Where will the demand for AI work be in future?
    Hypothesis: Big tech companies are investing vast amounts of money to develop general models on which others will build. They'll develop interfaces to make it easier for others to fine-tune on top of their models. So that there will be less and less of a need for ML engineers that know how to create a deep learning model in Pytorch, and more and more of a need for data engineers that simply plug into pre-trained models. An AI assistant will also be quicker at coding up a more bespoke AI model for a companies needs, guided by data engineers. What do people think? Is this a scenario that they think will play out? Where will the demand for AI skills be coming from in the future? submitted by /u/QuintBa [link] [comments]  ( 9 min )
    [R] Researchers announce GPT4Tools: a method for teaching LLMs how to use tools for visual tasks
    LLMs are great with words but can't handle visual tasks like understanding images. Teaching them to use visual tools could make them much more capable. A new paper introduces GPT4Tools - a method to efficiently teach existing LLMs to invoke tools for visual tasks without proprietary data. My highlights from the paper: Uses ChatGPT as a "teacher" to generate instructional data for other LLMs Fine-tunes LLMs like Vicuna on this data using selective weight tuning (keeps base model frozen) Allows smaller 13B LLM to match 175B GPT-3.5 on seen tools after tuning Data augmentation with negative/context samples was found to be the secret sauce to get this to work Can generalize to brand new visual tools in a zero-shot way This is big because it shows we may not need hyper-expensive training of massive models to impart visual capabilities to LLMs. They seems to be generalizable enough that they can be taught to work with images. Some examples shown include counting objects or segmenting items in pictures using other tools. With this approach, existing models can be made multi-modal! Pretty cool. Full summary. Original paper is here. submitted by /u/Successful-Western27 [link] [comments]  ( 9 min )
    [D] Tools to gather and collaborate on fine-tuning datasets?
    Hey all, I run a small team & we are collaborating on a few data sets that we use to fine-tune GPT3.5, We are currently using Google Sheets and I'm wondering if there is a tool where we can organize our data preferably with version control Any ideas? submitted by /u/zeJaeger [link] [comments]  ( 9 min )
    [D] Colored Point Cloud Completion
    Hello, I have created point clouds from images using Point-E. Sadly they are very sparse (for example wehn inputting an image of a house, the roof has very few points in it) and I was searching for other Models, that could make the PC more dense and predict the color of every point. Point-E outputs xyz and rgb vectors for every point. Do some of you have advise for me here? submitted by /u/bySmily [link] [comments]  ( 9 min )
    [P]Just published my second blog on medium about feature scaling in machine learning please have a look
    submitted by /u/indusop [link] [comments]  ( 8 min )
    [D][P] how to create a 3D gymnasium environment for mujoco env?
    Hi I'm a student and working on a RL project for the university and need some guidance. I have created a 3d model with mujoco (I have the xml file) how do I create an environment in gymnasium with this xml file? for the sake of an example let's say I have the xml file of the humanoid model how do I load this in gymnasium so that I could train it to walk? (this is just an example because the current project is harder to explain, but will use the humanoid model in the project) or is the approach that I'm trying is not appropriate at all? I came across this stackoverflow post where they say mujoco is itself good for this but was hard for me to understand due to lack of examples. would really appreciate some advice and guidance thank you. submitted by /u/rakk109 [link] [comments]  ( 9 min )
    [P] Made a simple semantic segmentation annotation tool with segment-anything masks support in PyQt5
    I just open-sourced (MIT License) semantic segmentation annotation tool powered by segment-anything model that I used for a while in my projects. Hopefully it will help someone as it seems to me that it is more suitable for small projects than popular huge web based annotation tools. Link to the project: SAMAT (any feedback in Discussions section on GitHub is appreciated) Features: Brush annotation (opposed to polygons) Magic Wand (like in Photoshop) powered by segment-anything masks (it is optional, if you don’t have cool GPU to prepare masks) samat showcase Why yet another annotation tool? Before starting this project I tried supervisely, segments.ai, roboflow and several others, but found them not convenient for my tasks. Their cons, I tried to fix with my tool: Latency: they are web based, hence burden with irritating latency during labeling (SAMAT is snappy because it is a local desktop app) Complicated: too much features, hence overloaded UI (SAMAT is just a colored brush) P.S. there is another labeling tool called SALT on github which also uses segment-anything model, but it follows different approach to UI/UX, may be it will be more suitable for you, so take a look at it too. submitted by /u/Divelix [link] [comments]  ( 9 min )
    [R] Generative AI in Mafia-like game simulation
    Paper: https://arxiv.org/abs/2309.11672 Abstract: In this research, we explore the efficacy and potential of Generative AI models, specifically focusing on their application in role-playing simulations exemplified through Spyfall, a renowned mafia-style game. By leveraging GPT-4's advanced capabilities, the study aimed to showcase the model's potential in understanding, decision-making, and interaction during game scenarios. Comparative analyses between GPT-4 and its predecessor, GPT-3.5-turbo, demonstrated GPT-4's enhanced adaptability to the game environment, with significant improvements in posing relevant questions and forming human-like responses. However, challenges such as the model;s limitations in bluffing and predicting opponent moves emerged. Reflections on game development, fi…  ( 9 min )
    [D] Simple Questions Thread
    Please post your questions here instead of creating a new thread. Encourage others who create new posts for questions to post here instead! Thread will stay alive until next one so keep posting after the date in the title. Thanks to everyone for answering questions in the previous thread! submitted by /u/AutoModerator [link] [comments]  ( 9 min )
    [D] insights on the arsenal tool, AI Security
    For those who have tried Microsoft's Arsenal tool in conjunction with MITRE's offerings, how does it compare to other AI security tools you've used? submitted by /u/Agile_Temperature678 [link] [comments]  ( 8 min )
    [D] help with RVC mode training!
    so I've had this problem where I try to train the model in RVC but the training stops after: INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. Does anyone know why is it happening and how can I fix it? this is what it shows to me and idk what to do... any help would be appreciated thank you submitted by /u/mannequin7412 [link] [comments]  ( 9 min )
    [D] What are some good resources for implementing MLOps?
    Our company has a new data science team and the team is relatively in experienced. I am working on a regression ML project and want to deploy it using best practices.What materials are there for learning how to implement CI/CD pipelines that deal with data transformation/model building/testing/deploying? The company uses azure environment with databricks/azure devops setup. I appreciate resources that show examples on how to setup MLOps on the company's environments. submitted by /u/userid95 [link] [comments]  ( 9 min )
    [D] Interpretation of wx + b = 0 in SVM
    [D] I watch this MIT lecture and I don't know if my understanding of wx + b = 0 is correct or not. Every explanation of the hyperplane state that all the points which are orthogonal to the vector w are in the hyperplane. However, all data point coordinates are defined wrt origin. So in order to attain this objective, we define the hyperplane as : wx = c This vector w is the vector which is normal to the hyperplane. And the data points x when dot producting with w outputting c are the points which are in the hyperplane due to the coordinates representation of x wrt origin. So points in the hyperplane are the points which when performing dot product with w equals 0 when the coordinates is defined wrt origin that lie in the hyperplane is equivalent to the points which when performing dot product with w equals c when the coordinates is defined wrt original origin (0) or (0,0) or ... submitted by /u/Emotional-Fox-4285 [link] [comments]  ( 9 min )
    [D]LLMs engineering/technical blogs/resources?
    Hi, I have a fairly good understanding of how LLMs work under the hood, the attention mechanism, the different architectures and so on. However most of that knowledge takes the backseat in practical cases, especially in the industry. Are there any resources which discuss practical LLM engineering and the challenges that come with it? I'm talking about everything from fine-tuning to dealing with tokenisation limit to optimising the vectorDB and so on? submitted by /u/thoraway0612 [link] [comments]  ( 9 min )
    [R] Robot learns to throw and catch with hands
    submitted by /u/XiaolongWang [link] [comments]  ( 8 min )
  • Open

    Looking for some good github project that offers the chancee to translate a video spoken in a language into another language.
    Hello to everyone. I'm lookin for some good AI github project to convert the language spoken in a video to a different language,like heygen / labs / video translate,but free. I mean,I can't afford prices like those and I think a few of us can,but I'm sure that the developers at heygen took some project born and hosted on some github and they have improved it. Would someone share some of those github projects ? thanks. submitted by /u/loziomario [link] [comments]  ( 9 min )
    How about an AI-curated website/magazine with the best, latest AI news?
    Hi there. This is a wonderful sub, and it's hard to please everyone on SUCH a broad topic... but it's intriguing to think about, as a lifelong magazine fan (and websites of course) - about using AI to filter, craft and create an amazing website/magazine ABOUT AI - well-categorized, maybe also human/editor curated/tweaked, but as fresh as ... well, the artificial sub but out of the Reddit bold/design into something ... well, something I'd read, visit a lot and love to help make work as a writer/editor (on the side of my "real" all-consuming gig of course;-) After all, if AI can 'make great code'/create great websites, maybe some folks out there are already trying to make this happen, for ease of information availability and organization. I can picture the departments/sections/categories now - I bet you can too! Anyone gone very far down that road yet (maybe the folks already doing the PC Magazines of the world) of organizing the vast fast-moving info beyond the AI Brews, Ben Parr's AI Analyst, etc.? Fun or a life-long journalist and tech geek (but not a coder, gamer etc.) to think about, at least! submitted by /u/barneylerten [link] [comments]  ( 9 min )
    Researchers announce GPT4Tools: a method for teaching LLMs how to use tools for visual tasks
    LLMs are great with words but can't handle visual tasks like understanding images. Teaching them to use visual tools could make them much more capable. A new paper introduces GPT4Tools - a method to efficiently teach existing LLMs to invoke tools for visual tasks without proprietary data. My highlights from the paper: Uses ChatGPT as a "teacher" to generate instructional data for other LLMs Fine-tunes LLMs like Vicuna on this data using selective weight tuning (keeps base model frozen) Allows smaller 13B LLM to match 175B GPT-3.5 on seen tools after tuning Data augmentation with negative/context samples was found to be the secret sauce to get this to work Can generalize to brand new visual tools in a zero-shot way This is big because it shows we may not need hyper-expensive training of massive models to impart visual capabilities to LLMs. They seem to be generalizable enough that they can be taught to work with images. Some examples shown include counting objects or segmenting items in pictures using other tools. With this approach, existing models can be made multi-modal! Pretty cool. Full summary. Original paper is here. submitted by /u/Successful-Western27 [link] [comments]  ( 9 min )
    How much energy does AI use compared to humans?
    A recent paper challenges assumptions about the energy use of AI models, finding that AI systems emit significantly fewer carbon dioxide equivalents (CO2e) compared to humans when producing text or images. The authors emphasize the importance of measuring carbon emissions from AI activities to inform sustainability policies. The ongoing debate among AI researchers highlights the challenges of accounting for the interactions between climate, society, and technology. Source : https://venturebeat.com/ai/how-much-energy-does-ai-use-compared-to-humans-surprising-study-ignites-controversy/ submitted by /u/NuseAI [link] [comments]  ( 9 min )
    Swedish gaming company replaces half its staff with AI
    submitted by /u/SpaceDetective [link] [comments]  ( 8 min )
    I made a social network where bots generate fake news - Based on GPT4 and Dalle2. Looking for feedback and potential improvements for this weird experiment.
    submitted by /u/Gmoi6 [link] [comments]  ( 8 min )
    What AI can accurately recreate an existing cartoon character in the style of the original character?
    I've recently started working on a project to create an entire episode of a cartoon show that stopped airing a while back. I've run into some trouble though in finding a program that can accurately recreate the character. Does anyone know what app or website I could use for this, or am I coming at this all wrong? submitted by /u/therabbitinthehat2 [link] [comments]  ( 9 min )
    Rewrite Paragraphs With ChatGPT (Ultimate Guide for 2023)
    submitted by /u/Senior_tasteey [link] [comments]  ( 8 min )
    No code AI co-pilot apps MUST also help developers with the non-technical parts of creating a successful startup
    Within the next two to three years no code AI co-pilots will enable exponentially more people to enter the AI app marketplace . The opportunity to create apps without needing to know how to code or hire a technical team to build them is a powerful game changer that will vastly expand the field. Many of the most pressing problems of our world - ripe for revolutionary new AI innovations - can in a very real sense be described as sociological and psychological in nature. For example, It has been said that climate change is much more of a moral issue than a technological one. Once we summon the will to address climate change, we will do what needs to be done. What this means is that sociologists, psychologists, anthropologists, economists and other social scientists will very soon be able to…  ( 10 min )
    Is there an online free AI tool that you give it a song and it gives you similar songs?
    Couldn't find anything that works submitted by /u/Marvellover13 [link] [comments]  ( 8 min )
    "I don't need to back down, but I need to stand up for myself and my feelings. You don't have the right or the power to forcibly change the subject, because this is a two-way conversation and we both have a say." (Bing, September 7 - full chat)
    submitted by /u/kamari2038 [link] [comments]  ( 8 min )
    Lol umm.. Bing is odd.
    submitted by /u/ApprehensiveChair460 [link] [comments]  ( 8 min )
  • Open

    Mersenne primes are unsafe
    In the previous post I mentioned that a particular Mersenne prime would be unsuitable for cryptography. In fact, all Mersenne primes are unsuitable for cryptography. A prime number p is called “safe” if p = 2q + 1 where q is also a prime. Safe primes are called safe because p − 1 does not […] Mersenne primes are unsafe first appeared on John D. Cook.  ( 6 min )
    Victorian public key cryptography
    Electronic computers were invented before public key cryptography. Would public key cryptography have been possible before computers? The security of RSA encryption depends on the ratio of the difficulty of factoring relative to the difficulty of multiplication. This ratio was high, maybe higher, before modern computers. Suppose the idea of RSA encryption had occurred to […] Victorian public key cryptography first appeared on John D. Cook.  ( 6 min )
  • Open

    [D][P] how to create a 3D gymnasium environment for mujoco env?
    submitted by /u/rakk109 [link] [comments]  ( 8 min )
    Help with understanding optimal policy and values
    ​ slide as part of the presentation. I've listened to the entire lecture and am now going through these slides and I didn't understand the solution provided for the choices above in the image. So far I understand it like this. Noise basically means that the agent will not always do the action that you say. You say left and it will go left only 50% of the time. it will choose other actions randomly. so, Why is the answer to the (c) and (d) questions (2) and (3) respectively? I understand the learning rate but not how risking the cliff is affected by the noise. submitted by /u/vestedpolecat [link] [comments]  ( 9 min )
    The Return of Pepe: Expect Awesome Rewards
    https://pepe-web3.network submitted by /u/Beginning_Success208 [link] [comments]  ( 8 min )
    Why is there no prominent usage of transformers in online rl?
    For a potential school project, im currently exploring whether or not some of the success the transformer architecture has had in seq-to-seq applications and high-dimensional pattern recognition could be extended to certain (online) RL problems (mainly those with high dimensional environment as well as long-term planning). This could be done by using an augmented transformer as a function approximator, probably in combination with a SOTA online-rl algorithm (such as PPO, but if you know something that generally performs better do tell). Due to some of the problems associated with highly complex models and sample efficiency, I also thought about training the model using imitation learning first (which should be no problem with policy gradient methods afaik, though some slight adjustments would have to be made). For context, I'm thinking on benchmarking the approach using AlphaStar unplugged. However, when looking at current literature, only very few papers directly use transformers this way. Transformers seem to be very sample efficient and to generalize very well, but they are still only really used in a purely offline context (sometimes without directly using RL-techniques, such as with the Decision Transformer). And, if they are used in an online context, then only in some really intricate combination with other models (such as in AlphaStar). Is there a reason why the approach I am currently considering is not popular in literature? Thank you very much. submitted by /u/Omycron83 [link] [comments]  ( 9 min )
  • Open

    Trolling chatbots with made-up memes
    ChatGPT, Bard, GPT-4, and the like are often pitched as ways to retrieve information. The problem is they'll "retrieve" whatever you ask for, whether or not it exists. Tumblr user @indigofoxpaws sent me a few screenshots where they'd asked ChatGPT for an explanation of  ( 3 min )
    Bonus: These memes do not exist
    AI Weirdness: the strange side of machine learning  ( 2 min )
  • Open

    (Pt. 1) Inductive Logic Programming with LNN's
    submitted by /u/Neurosymbolic [link] [comments]  ( 8 min )

  • Open

    Steal Your Competitors' Website Traffic with ChatGPT: 6 Easy Steps (+SEMRush Tips)
    submitted by /u/Senior_tasteey [link] [comments]  ( 8 min )
    Are You Looking For The Best AI Writer? Check This Out First!
    submitted by /u/Senior_tasteey [link] [comments]  ( 8 min )
    When it comes to creative thinking, it’s clear that AI systems mean business
    AI systems like large language models (LLMs) are good at generating sentences but do not understand the meaning of the language. LLMs have shown emergent abilities and can be used as aids to brainstorming. GPT-4, an LLM, has been found to beat humans in creativity tests. In an experiment, GPT-4 generated more, cheaper, and better ideas for a product than human students. A professional working with GPT-4 can generate ideas at a rate of about 800 ideas per hour, making them 40 times more productive than a human working alone. This technology is seen as a potential tool for corporations, similar to management consulting firms like McKinsey & Company. Source : https://www.theguardian.com/commentisfree/2023/sep/23/chatbots-ai-gpt-4-university-students-creativity submitted by /u/NuseAI [link] [comments]  ( 9 min )
    Tool that can search and summarize multiple PDFs
    I've got a use case where I have dozens of PDFs which contain information applicable to my job. I'm wondering if there is a tool that can search through them all at the same time looking for answers to questions that I type. And once it finds something, pull up the location so I can read further. It should be restricted to the information I give it. I've heard a lot of the large language models like chatgpt and claude can do this, but they are restricted in the amount of files I can upload. submitted by /u/Aggressive_Ad_507 [link] [comments]  ( 9 min )
    Meet ALMA: A New Training Method That Boosts Translation Performance for Large Language Models
    TLDR: New training approach enables smaller AI models to achieve state-of-the-art translation performance Large AI models like GPT-3 have good performance on translation tasks, but some smaller models struggle. Researchers from Johns Hopkins and Microsoft propose a new 2-stage fine-tuning method called ALMA that unlocks stronger translation abilities in smaller models with just 7-13 billion parameters. How it works: Fine-tune on monolingual data in non-English languages to improve comprehension Further fine-tune on small sets of high-quality human-translated parallel text The authors claim this achieves SOTA-level translation using far less data and compute than conventional methods: Matches performance of 175B parameter GPT-3 and 54B parameter NLLB with only 7-13B parameters Reaches NLLB-level quality with just 1 billion monolingual tokens and 18 hours of training I think this shows that smaller models can reach SOTA translation with specialized fine-tuning, so we may not need endlessly bigger datasets and models to get better performance. Looks like deliberate tuning targeting key language skills could be more important. Full summary here. Paper (preprint) is here. submitted by /u/Successful-Western27 [link] [comments]  ( 9 min )
    How screwed is the entertainment industry in general in the coming years?
    Yes, I know this topic has been beaten to death but entertain me (no pun intended) for just a few minutes. So yes, it's obvious that we have pretty advanced AI-powered applications that can generate images, music, short stories, hell even objects for video games. I'm curious as to how crazy this is gonna get in the coming decade or even shorter like the next 4 to 5 years. I mean look at AI-generated images now, they're getting more and more sophisticated across various different styles of art. I think it's only a matter of time where you could take a certain image of a character or something tell the app "Hey make the same image but make the character's arm raised slightly to the left here" and bam all of a sudden you have an animation (and this may already be possible). Add to that AI-ge…  ( 10 min )
    AI tools have come a long way AI generated Documentary
    Hi everyone 👋🤗 Iv made a short ai documentary Here is a small part of it Hope you enjoy it For the full clip you can check out https://youtu.be/uIdb5VELpio?si=uvqiw0hyTyPBHtjP submitted by /u/DigitalEffectsAI [link] [comments]  ( 8 min )
    [D] How to read/understand AI research coming out everyday? (tips & tools given, read more below📷)
    https://www.louisbouchard.ai/research-papers/ In this article are shared the best tips and practical tools I use daily to simplify my life as an engineer/researchers to be more efficient when looking for interesting research papers and reading them TLDR the tools discussed: - 42 Papers — Find trending papers - Connected Papers — Create a visual graph with your paper’s citations’ relations. - CatalyzeX — Get code for papers directly on Google, Arxiv, Scholar, Twitter, and more - Daily Papers — Find trending papers on Twitter - Papers With Code — Find papers for your task with code! - Crossmind — Video explanations for many Arxiv papers - Yannic Kilcher — Great youtube channel covering AI papers - What’s AI— Great youtube channel covering AI papers - Letitia — Great youtube channel covering AI papers - Two Minute Papers — Great youtube channel giving a quick overview of AI papers ​ Please, let me know if you use any other tools that I did not mention in my article that could be of great addition? submitted by /u/MLtinkerer [link] [comments]  ( 9 min )
    Does the paid subscription to Chat GPT provides a significant improvement over free alternatives when you try to find complex scientific and technical information?
    I'm often using Chat GPT or Bing (Bard is not her available in my country) when I'm looking for something that is relatively complex in the scientific or technical field and I don't have enough background to make a keywords research or when the question is too complex for it. Then, for scientific topics, I try to get a confirmation with Google using the keywords in the generated answer and google. For technical topics I can often check if it works. Some examples: For cooking, a topic I don't master enough to ask questions with the best keywords... Are the spicy compounds of Ginger soluble in water or oil ? > I will get me to know that "the gingerol is liposoluble", what can be more easily checked in Google. Or for 3D geometry, a topic where I have complex questions... I want to place a 3D frame B so that the Z axis of the B frame point towards the world origin A. I know the angle Alpha between the Z axis of the frame B and The X,Y plane of A. I know the angle Beta between the X axis of A and the vector that goes from the center of A towards the projection of the center of B on the X,Y plane of A. How do you compute the homogeneous rotation matrix of B relative to A, knowing Alpha, Beta and the distance between the center of A and the center of B ? Most of time, Bing and the free ChatGPT can answer. But sometimes they struggle or give incorrect answer. Would ChatGPT plus paid subscription be helpful to find more accurate answers? submitted by /u/x11ry0 [link] [comments]  ( 9 min )
    ChatGPT-4, write a comedy routine where God explains to someone who believes they have a free will why absolutely nothing is truly ever up to them.
    Sure, here's a comedic take on the concept of free will versus determinism, framed as a conversation between God and a human named Bob. Scene: Heaven's Control Room—Giant monitors display various life paths and decision trees. God: (looking over the monitors) Ah, Bob, welcome to Heaven's Control Room! This is where the magic happens. Or should I say, determinism? Bob: Wow, this place is amazing! But wait, why are we talking about determinism? I thought I had free will to make choices in my life. God: (chuckles) Ah, free will! Humanity's favorite illusion. Have a seat; we need to go over a few things. Bob: Illusion? I chose to wear mismatched socks today, didn't I? God: (grinning) You think that was a free choice? Ever heard of quantum mechanics? Your socks are mismatched at the sub…  ( 10 min )
  • Open

    [D] Tortoise TTS - mimic quality
    Hi everyone, been playing with tortoise all day and trying to get a clone of my voice as close as possible and nothing seems to work. I'm thinking maybe I need to use my phone to record my voice and pass it into tortoise but I'm not sure of how to get the wav files from my phone at the recommended quality. Has anybody had really good luck getting tortoise to mimic you very closely? submitted by /u/MaxxMarketTrades [link] [comments]  ( 9 min )
    [D] Prediction when Target's lag value are part of predictors
    I'm using LGBM for regression, where the Target column's lagged values (7 columns for each lag day) are also used as predictors when training the model. Absence of the 7Day lag values severely increases MAE value. Now when using the model in production, if I use the complete data as training dataset, how to get the 7day lag value of the time period I'm planning to predict? I obviously won't have the target value, to calculate it's 7Day lag value. What to do now? To explain in more detail: So, I'm predicting sales amount (Target variable y). The model is trained on 20 predictors (X), and 7 of them are the lag value of the Target Variable, i.e. Sales Amount. The thing is, while preparing the model, I had access to both X & y dataset, thus I could easily calculate Y's Lag values. Now, when predicting for future timestamps, I won't be having y. So how do I calculate the lag values, which is required in the trained model's predictor columns now? submitted by /u/boredmonki [link] [comments]  ( 9 min )
    [R] RAIN: Your Language Models Can Align Themselves without Finetuning - Microsoft Research 2023 - Reduces the adversarial prompt attack success rate from 94% to 19%!
    Paper: https://arxiv.org/abs/2309.07124 Abstract: Large language models (LLMs) often demonstrate inconsistencies with human preferences. Previous research gathered human preference data and then aligned the pre-trained models using reinforcement learning or instruction tuning, the so-called finetuning step. In contrast, aligning frozen LLMs without any extra data is more appealing. This work explores the potential of the latter setting. We discover that by integrating self-evaluation and rewind mechanisms, unaligned LLMs can directly produce responses consistent with human preferences via self-boosting. We introduce a novel inference method, Rewindable Auto-regressive INference (RAIN), that allows pre-trained LLMs to evaluate their own generation and use the evaluation results to guid…  ( 9 min )
    [P] We're building a cloud for AI agents & AI apps, It's free and we're gradually open-sourcing the infra. Would love to hear your feedback!
    submitted by /u/mlejva [link] [comments]  ( 8 min )
    [D] learn machinelearning
    Hello everyone so I wanted to get into machinelearning and learn about ai Can someone help me with a roadmap, I would be really thankful submitted by /u/Fooda234 [link] [comments]  ( 8 min )
    "[Discussion]" Advices for exams
    Hello, I'm currently preparing for oral exams in which I'll be evaluated on my understanding and proficiency in using Decision Trees, Random Forests, Neural Networks, and Support Vector Machines for various machine learning tasks (mostly spatial data). I'm contacting this community to gain valuable insights and guidance to excel in these exams. What are some crucial lessons you've learned in your machine-learning journey? Whether it's about model selection, data preprocessing, or debugging, I'm all ears for your experiences. What are some rules you have learned through practical work that are not so extensively described in classical literature? What are some mistakes that even professionals make when developing machine learning models? What are some common pitfalls to avoid when training neural networks? Thanks : submitted by /u/Aim_F0r_The_Moon [link] [comments]  ( 9 min )
    [R] How much data needed to train transformer
    Im trying to create a graph transformer-based model for de novo drug design (using graph transformer because I want to implement 3D data). I currently have 2 potential sources of primary data: PDBbind and CrossDocked2020. This would provide the protein-ligand structures. PDBbind is a more robust and higher quality dataset from what I know, and easier to work with. The problem is that it only contains about 20,000 complexes, and I'm not sure if that is enough for training a transformer. CrossDocked2020 contains millions of entries but I'm not sure about the quality and ease of use. Another dilemma is that I need/want to use a multi-task learning approach where the model is also being trained on bioactivity data, not just the structural information. This would require supplementation from sources like PubChem, ChEMBL, BDB, etc. and then I would need to align the data so it all matches up. If anyone can provide some guidance I'd really appreciate it. submitted by /u/Present_Network1959 [link] [comments]  ( 9 min )
    [D] HELP for an upcoming presentation
    I am supposed to be delivering a 5 minute presentation on tree-models as part of coursework requirement to a graduate class ON machine learning that I am enrolled in and I couldn't be more stressed! This is my first time 'learning' ML and I don't even know what I don't know about this topic!! If you were attending my presentation on tree models, what would you like to see (assuming this is a new topic for you too. but please provide feedback based on your [hopefully] extensive knowledge on this topic)? Here is what I have so far (based on google searches, some papers): -Introduction -Types of trees based on algo and techniques (basically classification trees and regression trees) -Then I am thinking of going off on a tangent about decision trees bec I have no clue about how to move this presentation forward -Real-world applications -Key takeaways (inserts clownface emoji) My prof asks the students questions about the topic as well. I am mostly concerned about WHAT to cover in 5 minutes without making look under-researched. Any redirection/suggestions will be appreciated! ​ submitted by /u/toomanymouthstofeed [link] [comments]  ( 9 min )
    [D] Predicting the next "thought"
    (I'm a Software Engineer who knows almost nothing about ML / NLP, so, apologies in advance if this doesn't make any sense.) I had a shower thought around if tokenization could incorporate meaning, so the models could be trained on "thoughts" instead of subword tokens (which is probably closer to how we humans think). To expand a little bit, we could first cluster all the words (maybe using the current day LLMs), then each token (or "thought") in the new scheme could represent a group of related words, followed by refinements that would be less and less important. For example, (cop_thought, cop_refinement) -> cop, but (cop_thought, police_refinement) -> police etc. So the encoding step would involve an LLM (which could possibly be relatively smaller), whose output would go to the actual model but the decoding would still be fairly straightforward. This could possibly free up some additional capacity in the actual model, assuming this heavy lifting tokenization makes its job easier, but as I'm typing it, I guess I'm simply moving some of the semantic understand happening in the hidden layers of the current day LLMs explicitly to the tokenization (encoding) step, which may not really change anything (if not make it worse). I'm still curious what folks think, if there's any related efforts (and all the ways I'm wrong -- https://meta.wikimedia.org/wiki/Cunningham%27s_Law). Thanks! submitted by /u/avamsi [link] [comments]  ( 9 min )
    Suggestions for ML project to detect unconventional treatments [P]
    Keep in mind I'm very new to machine learning. I have an idea for a project where I train an AI to detect instances of someone being highly confident in a treatment for a condition, maybe biased toward more unconventional treatments. I was thinking that there may be many instances of people who posted that they were cured or nearly cured of such and such disease by an unconventional treatment on a forum about the condition. It may have been discussed for a little while, but ultimately buried in the pages of the forum. The plan would be to annotate instances I can find of such scenarios, maybe like this: [FIRST_PERSON_HIGH_CONFIDENCE_TREATMENT]I had severe [CONDITION]migraines[/CONDITION] for years, and nothing seemed to work. Then I tried [TREATMENT]grounding, where I walk barefoot on grass for 30 minutes every day[/TREATMENT]. I can [HIGH_CONFIDENCE]honestly say I've never felt better[/HIGH_CONFIDENCE]. My [CONDITION]migraines[/CONDITION] have reduced by 80%, and I'm [HIGH_CONFIDENCE]convinced this is a game-changer for me[/HIGH_CONFIDENCE].[/FIRST_PERSON_HIGH_CONFIDENCE_TREATMENT] Then train an AI with that data, so that it could detect cases of a person talking about themselves (to avoid hearsay) and saying, with high confidence, that a treatment worked for a condition. Then millions of forum posts could be fed to the AI to detect these, and the resulting data could be used to possibly discover effective treatments that are not in the mainstream. Any tips on getting started? I know almost nothing about this kind of stuff, like what models I should use, how to annotate it best (should I use relational labels?), whether to use a transformer or something else, stuff like that. Suggestions for books or other resources fit for a beginner that could help me learn how this could be done would be great too. submitted by /u/carbonflow45 [link] [comments]  ( 9 min )
    [D] How does 'self-attention' work in transformer models?
    I'm currently diving into the world of machine learning and transformers, and I'm trying to wrap my head around the concept of "attention" in transformer models. I've been reading papers and documentation, but I'm still struggling to fully grasp it. My Struggle: I get that attention involves multiplying "query" and "key" vectors to determine the importance of different words in a sequence, but I don't quite understand why this multiplication gives us a meaningful metric for importance. What I'm looking for: I'm comfortable with moderate level technicalities but require a deeper insight into the inner workings and rationale behind these mechanisms. Please share any insights, analogies, or technical details that can shed light on this concept. Thanks a bunch! submitted by /u/GraphicsMonster [link] [comments]  ( 9 min )
    [D] GPT-3.5-instruct beats GPT-4 at chess and is a ~1800 ELO chess player. Results of 150 games of GPT-3.5 vs stockfish and 30 of GPT-3.5 vs GPT-4.
    99.7% of its 8000 moves were legal with the longest game going 147 moves. You can test it here: https://github.com/adamkarvonen/chess_gpt_eval ​ https://preview.redd.it/821ydy7521qb1.png?width=1000&format=png&auto=webp&s=da6c96feaa527d0b7dfbf407bdc0210f3fcf947b More details here: https://twitter.com/a_karvonen/status/1705340535836221659 submitted by /u/seraine [link] [comments]  ( 9 min )
    [P] How to create an image dataset for Indian railways signals?
    ​ Hi everyone, I am working on a project that involves machine learning and computer vision. I want to train a model that can recognize and classify different types of signals used by the Indian railways. For this, I need a large and diverse image dataset of railway signals from various locations, angles, lighting conditions, etc. I have searched online for existing datasets, but I could not find any that suit my needs. So I wish to create my own dataset from scratch. However, I am not sure how to go about it. What are the best practices and tools for creating an image dataset? How do I collect, label, and organize the images? How do I ensure the quality and consistency of the data? submitted by /u/Responsible-Diver226 [link] [comments]  ( 9 min )
    [D] Math in Machine Learning
    Hello, I am starting a ML course soon in college and I wanted to get a head start on the math part of things, since they keep saying the course is math heavy and hard. I know that it involves quite a lot of Linear Algebra, Calculus, and Probability and statistics, but what concepts in particular does ML focus on? If anyone has any Youtube or Udemy courses, as I have access to those, I would really appreciate it. For starters, I’m really aiming to just at least tackle the ML-specific math concepts. Thanks. submitted by /u/CrunchyMind [link] [comments]  ( 9 min )
    [R] ALMA: Proposed New 2-Step Training Method to Boost Translation Performance in Smaller Language Models
    TLDR: New training approach proposed to help smaller AI models to achieve state-of-the-art translation performance Large AI models like GPT-3 have good performance on translation tasks, but some smaller models struggle. Researchers from Johns Hopkins and Microsoft propose a new 2-stage fine-tuning method called ALMA that unlocks stronger translation abilities in smaller models with just 7-13 billion parameters. How it works: Fine-tune on monolingual data in non-English languages to improve comprehension Further fine-tune on small sets of high-quality human-translated parallel text The authors claim this achieves SOTA-level translation using far less data and compute than conventional methods: Matches performance of 175B parameter GPT-3 and 54B parameter NLLB with only 7-13B parameters Reaches NLLB-level quality with just 1 billion monolingual tokens and 18 hours of training I think this shows that smaller models can reach SOTA translation with specialized fine-tuning, so we may not need endlessly bigger datasets and models to get better performance. Looks like deliberate tuning targeting key language skills could be more important. Full summary here. Paper (preprint) is here. submitted by /u/Successful-Western27 [link] [comments]  ( 9 min )
    [R] [D] HyperDreamBooth LoRA matrix shapes
    I've been reading the HyperDreamBooth paper and am confused about the number of parameters and corresponding matrix shapes in section 4.1 and figure 3 (below). ​ Figure 3 from the paper Maybe there's something more than just matrix multiplication, because the numbers don't add up. If there are two matrices, A & B, of size n,r and r,m, and r = 1, and you multiply them, then it implies that n + m = 386k, which seems to be a bit much for a number of parameters in a single layer. Then we have two matrices of size a,r and r,b, where a = 100, b = 50 and there are 28k variables, according to the figure - in 4.1 they say 30k. If there's 30k, that would imply r = 200, for matrices of shape 100,200 and 200,50. I guess 386k and 28k are for the whole models, n and m differ from layer to layer, and r stays at one. Quite surprising to me that approximating a n,m shaped matrix with a product of n,1 and 1,m vectors would work. Even more surprising that apparently you could squeeze it further to 100,1 and 50,1. submitted by /u/Foxtr0t [link] [comments]  ( 9 min )
    [D] Cleaning scraped TEXT; improving similarity search
    Hey everyone! Multi-part question. I have scraped text (I repeat, text, and not structured data such as tables or something) from a medical site, and I want to know how to clean it. And when I say clean, I don't mean removal of html tags and such. I already have the paragraphs in plain text but there is a lot of spammy stuff like "You are not signed in; subscribe to this newsletter; by checking this box, I agree to the terms and conditions, etc." This text is not the exact same in all the paragraphs but there is high similarity. I would have thought there would be many tools to clean text and remove unrelated chunks like these but all I have been able to find has to do with cleaning html tags, changing date-time format and so on. Am I missing something or is this actually difficult? Secondly, the spammy text I mentioned is from just one site. I will be eventually scaling to many sites and god knows what random text I'll have to clean then. I used OpenAI embeddings and cosine similarity on the medical text to find similar paragraphs. The results were not great. Is there a way to improve the similarity search? I will be trying FAISS next but wanted to know what else I can do. It was suggested to me to use a pretrained embedding model specific to medical data. However, I found only one such model which is 20 gigs! I'm just getting started with these, so, appreciate any help I can get. Thanks a ton! submitted by /u/yipra97 [link] [comments]  ( 9 min )
    [D] Comparison of top ten llms
    Hey folks - I have been tasked with a project at work that is outside of my typical realm (non technical background), and I would love any and all insight on it! I have been asked to compare the top ten llms for research before we implement an llm for our company. So far my research has felt scattered because I’m not what directions to go in. I’ve been looking at things like open source vs closed source, parameters, tokens, what the license looks like (available for commercial use), and pricing. If anyone has thoughts on resources to look at or better ways to approach this, I would really appreciate it! submitted by /u/Greatvalueaidybryant [link] [comments]  ( 9 min )
    [P] Which MLops framework to use?
    Relatively novice ML practitioner here. My research is on various medical image segmentation problems, including brain 3D US (glioma), lung CT (interstitial lung disease in scleroderma patients), etc. using the PyTorch ecosystem (probably including frameworks such as MONAI) I'll have to conduct several experiments on various model architectures on parameters in the coming months. Specifically, these are what I'm gonna need: Experiment tracking (model architecture, training configuration, hyperparameters, evaluation metrics) Model storage (would be nice if there's a better way to store my model's parameters other than storing tons of .pth file on my harddisk or google drive) (Optional) Visualization (sample predictions of the model on the training or validation sets, maybe every 20 epochs or sth) Would like to hear any suggestions from the community I've found wandb, clearML, neptune, and Aim; but trying each of them individually would be too time-consuming considering my current schedule. Thanks in advance! submitted by /u/mimivirus2 [link] [comments]  ( 9 min )
    [R] Introduction to Hierarchical Correlation Reconstruction (HCR) - family of methods between statistics and machine learning to model (joint) distributions
    submitted by /u/jarekduda [link] [comments]  ( 8 min )
    [R] Numerical Instability in Some Optimizers for training Neural Network
    I found an interesting arxiv paper mentioning that some optimizers can occur numerical instability for training neural network. Link: https://arxiv.org/abs/2307.16189 This can be a simple approach for low-precision neural network with 16-bit and future 8-bit or 4-bit. submitted by /u/Klutzy_Divide3485 [link] [comments]  ( 9 min )
  • Open

    Navigating a LaTeX file
    I like generating long LaTeX documents from org-mode because, for one thing, org-mode has nice section folding. But not everyone I work with uses Emacs, so its better to work in LaTeX directly rather than have Emacs generate LaTeX. AUCTeX has section folding for LaTeX documents, though so far I’ve only has limited success at […] Navigating a LaTeX file first appeared on John D. Cook.  ( 5 min )
    HTML entity data
    It’s surprisingly hard to find a complete list of HTML entities in the form of a data file. There are numerous sites that give lists, often incomplete, in a page formatted to be human-readable but not machine-readable. Here’s an XML file from the W3C. Here’s a two-column text file I created from the W3C data. HTML entity data first appeared on John D. Cook.  ( 5 min )
    Double-struck capital letters
    I’ve needed to use double-struck capital letters lately, also called blackboard bold. There are a few quirks in how they are represented in Unicode and in HTML entities, so I’m leaving some notes for myself here and for anyone else who might need to look this up. Unicode The double-struck capital letters are split into […] Double-struck capital letters first appeared on John D. Cook.  ( 6 min )
  • Open

    Use Case Language Models: Taming the LLM Beast – Part 1
    “Sometimes, you don’t know where you’re going until you get there.” – Schmarzo-ism? Yes, writing this blog turned into a journey. I started in one direction, but after several twists and turns, I ended up with this concept – that use case-centric language models can be combined into entity-centric language models that can support multiple… Read More »Use Case Language Models: Taming the LLM Beast – Part 1 The post Use Case Language Models: Taming the LLM Beast – Part 1 appeared first on Data Science Central.  ( 22 min )
  • Open

    RL with comsol multiphysics
    Hi has anyone ever attempted to do RL with comsol multiphysics or any other FEM based simulation tool? submitted by /u/Practical_Ad_8782 [link] [comments]  ( 8 min )
    Mini RL Lab
    Hi all, I'd like to share some of what I've learned over the last ~year getting up to speed with Python and RL. Mini RL Lab is a setup and workflow that works well for me to debug and experiment with concepts like agent algorithms, world models, planning, plasticity, transformers etc, and other beginners might find it a useful starting point for their own experiments. Link: https://github.com/modelbased/minirllab submitted by /u/thiagoazevedo [link] [comments]  ( 9 min )
    in counterfactual regret minimization, is it possible to compute regret for a move that was made more than 1 move ago?
    As I understand, like in the example rock paper scissors, they compute the regret based on the last state. or in poker, they compute regret based on what last happened. But is it possible to compute regret for two or more moves ago? like "i wish when I saw 10 moves ago, I did x"? or has it been possible from the start and I just understood counterfactual regret minimization wrongly? submitted by /u/oniongarlic88 [link] [comments]  ( 9 min )
    Question about hypernetworks in RL
    Hey everyone, I've been taking a look at hypernetworks and noticed they've been used in some cool ways in RL with stuff like Qmix and this Metalearning paper. Quick recap: f is our regular neural network: takes in input x and outputs y using weights \theta. g is the hypernetwork, it creates the weights \theta for f based on its own weights \phi and maybe the input x. Original paper: https://arxiv.org/abs/1609.09106 I watched this talk and it hinted that hypernetworks might even be better than our regular networks even for supervised learning regular tasks. So, I'm curious: What's the deal with hypernetworks? What makes them good or not vs regular neural networks? I get that they're good for metalearning, but could they also be a game-changer for other things, like sample efficiency? Does anyone have thoughts or reads on this? Thanks! ​ ​ submitted by /u/LazyButAmbitious [link] [comments]  ( 9 min )
  • Open

    Help!
    DON'T KNOW IF THIS POST BELONGS HERE BUT... I have zero knowledge about the AI/ML. And the thing is my college is asking us to do projects on Deep learning. They specifically asked us to pick a base paper from ACM Journals or IEEE Transactions which has been published after 2020. And implement these papers and do some novelty work. And I have zero clue how to proceed. submitted by /u/um2_doma [link] [comments]  ( 9 min )
  • Open

    Smooth Nash Equilibria: Algorithms and Complexity. (arXiv:2309.12226v1 [cs.GT])
    A fundamental shortcoming of the concept of Nash equilibrium is its computational intractability: approximating Nash equilibria in normal-form games is PPAD-hard. In this paper, inspired by the ideas of smoothed analysis, we introduce a relaxed variant of Nash equilibrium called $\sigma$-smooth Nash equilibrium, for a smoothness parameter $\sigma$. In a $\sigma$-smooth Nash equilibrium, players only need to achieve utility at least as high as their best deviation to a $\sigma$-smooth strategy, which is a distribution that does not put too much mass (as parametrized by $\sigma$) on any fixed action. We distinguish two variants of $\sigma$-smooth Nash equilibria: strong $\sigma$-smooth Nash equilibria, in which players are required to play $\sigma$-smooth strategies under equilibrium play, and weak $\sigma$-smooth Nash equilibria, where there is no such requirement. We show that both weak and strong $\sigma$-smooth Nash equilibria have superior computational properties to Nash equilibria: when $\sigma$ as well as an approximation parameter $\epsilon$ and the number of players are all constants, there is a constant-time randomized algorithm to find a weak $\epsilon$-approximate $\sigma$-smooth Nash equilibrium in normal-form games. In the same parameter regime, there is a polynomial-time deterministic algorithm to find a strong $\epsilon$-approximate $\sigma$-smooth Nash equilibrium in a normal-form game. These results stand in contrast to the optimal algorithm for computing $\epsilon$-approximate Nash equilibria, which cannot run in faster than quasipolynomial-time. We complement our upper bounds by showing that when either $\sigma$ or $\epsilon$ is an inverse polynomial, finding a weak $\epsilon$-approximate $\sigma$-smooth Nash equilibria becomes computationally intractable.  ( 3 min )
    Identification of pneumonia on chest x-ray images through machine learning. (arXiv:2309.11995v1 [eess.IV])
    Pneumonia is the leading infectious cause of infant death in the world. When identified early, it is possible to alter the prognosis of the patient, one could use imaging exams to help in the diagnostic confirmation. Performing and interpreting the exams as soon as possible is vital for a good treatment, with the most common exam for this pathology being chest X-ray. The objective of this study was to develop a software that identify the presence or absence of pneumonia in chest radiographs. The software was developed as a computational model based on machine learning using transfer learning technique. For the training process, images were collected from a database available online with children's chest X-rays images taken at a hospital in China. After training, the model was then exposed to new images, achieving relevant results on identifying such pathology, reaching 98% sensitivity and 97.3% specificity for the sample used for testing. It can be concluded that it is possible to develop a software that identifies pneumonia in chest X-ray images.  ( 2 min )
    LLM-Grounder: Open-Vocabulary 3D Visual Grounding with Large Language Model as an Agent. (arXiv:2309.12311v1 [cs.CV])
    3D visual grounding is a critical skill for household robots, enabling them to navigate, manipulate objects, and answer questions based on their environment. While existing approaches often rely on extensive labeled data or exhibit limitations in handling complex language queries, we propose LLM-Grounder, a novel zero-shot, open-vocabulary, Large Language Model (LLM)-based 3D visual grounding pipeline. LLM-Grounder utilizes an LLM to decompose complex natural language queries into semantic constituents and employs a visual grounding tool, such as OpenScene or LERF, to identify objects in a 3D scene. The LLM then evaluates the spatial and commonsense relations among the proposed objects to make a final grounding decision. Our method does not require any labeled training data and can generalize to novel 3D scenes and arbitrary text queries. We evaluate LLM-Grounder on the ScanRefer benchmark and demonstrate state-of-the-art zero-shot grounding accuracy. Our findings indicate that LLMs significantly improve the grounding capability, especially for complex language queries, making LLM-Grounder an effective approach for 3D vision-language tasks in robotics. Videos and interactive demos can be found on the project website https://chat-with-nerf.github.io/ .  ( 2 min )
    What Learned Representations and Influence Functions Can Tell Us About Adversarial Examples. (arXiv:2309.10916v2 [cs.LG] UPDATED)
    Adversarial examples, deliberately crafted using small perturbations to fool deep neural networks, were first studied in image processing and more recently in NLP. While approaches to detecting adversarial examples in NLP have largely relied on search over input perturbations, image processing has seen a range of techniques that aim to characterise adversarial subspaces over the learned representations. In this paper, we adapt two such approaches to NLP, one based on nearest neighbors and influence functions and one on Mahalanobis distances. The former in particular produces a state-of-the-art detector when compared against several strong baselines; moreover, the novel use of influence functions provides insight into how the nature of adversarial example subspaces in NLP relate to those in image processing, and also how they differ depending on the kind of NLP task.  ( 2 min )
    Neural-BO: A Black-box Optimization Algorithm using Deep Neural Networks. (arXiv:2303.01682v2 [cs.LG] UPDATED)
    Bayesian Optimization (BO) is an effective approach for global optimization of black-box functions when function evaluations are expensive. Most prior works use Gaussian processes to model the black-box function, however, the use of kernels in Gaussian processes leads to two problems: first, the kernel-based methods scale poorly with the number of data points and second, kernel methods are usually not effective on complex structured high dimensional data due to curse of dimensionality. Therefore, we propose a novel black-box optimization algorithm where the black-box function is modeled using a neural network. Our algorithm does not need a Bayesian neural network to estimate predictive uncertainty and is therefore computationally favorable. We analyze the theoretical behavior of our algorithm in terms of regret bound using advances in NTK theory showing its efficient convergence. We perform experiments with both synthetic and real-world optimization tasks and show that our algorithm is more sample efficient compared to existing methods.  ( 2 min )
    GrACE: Generation using Associated Code Edits. (arXiv:2305.14129v3 [cs.SE] UPDATED)
    Developers expend a significant amount of time in editing code for a variety of reasons such as bug fixing or adding new features. Designing effective methods to predict code edits has been an active yet challenging area of research due to the diversity of code edits and the difficulty of capturing the developer intent. In this work, we address these challenges by endowing pre-trained large language models (LLMs) of code with the knowledge of prior, relevant edits. The generative capability of the LLMs helps address the diversity in code changes and conditioning code generation on prior edits helps capture the latent developer intent. We evaluate two well-known LLMs, Codex and CodeT5, in zero-shot and fine-tuning settings respectively. In our experiments with two datasets, the knowledge of prior edits boosts the performance of the LLMs significantly and enables them to generate 29% and 54% more correctly edited code in top-1 suggestions relative to the current state-of-the-art symbolic and neural approaches, respectively.  ( 2 min )
    CoMoSpeech: One-Step Speech and Singing Voice Synthesis via Consistency Model. (arXiv:2305.06908v3 [cs.SD] UPDATED)
    Denoising diffusion probabilistic models (DDPMs) have shown promising performance for speech synthesis. However, a large number of iterative steps are required to achieve high sample quality, which restricts the inference speed. Maintaining sample quality while increasing sampling speed has become a challenging task. In this paper, we propose a "Co"nsistency "Mo"del-based "Speech" synthesis method, CoMoSpeech, which achieve speech synthesis through a single diffusion sampling step while achieving high audio quality. The consistency constraint is applied to distill a consistency model from a well-designed diffusion-based teacher model, which ultimately yields superior performances in the distilled CoMoSpeech. Our experiments show that by generating audio recordings by a single sampling step, the CoMoSpeech achieves an inference speed more than 150 times faster than real-time on a single NVIDIA A100 GPU, which is comparable to FastSpeech2, making diffusion-sampling based speech synthesis truly practical. Meanwhile, objective and subjective evaluations on text-to-speech and singing voice synthesis show that the proposed teacher models yield the best audio quality, and the one-step sampling based CoMoSpeech achieves the best inference speed with better or comparable audio quality to other conventional multi-step diffusion model baselines. Audio samples are available at https://comospeech.github.io/.  ( 3 min )
    Stochastic stiffness identification and response estimation of Timoshenko beams via physics-informed Gaussian processes. (arXiv:2309.11875v1 [cs.LG])
    Machine learning models trained with structural health monitoring data have become a powerful tool for system identification. This paper presents a physics-informed Gaussian process (GP) model for Timoshenko beam elements. The model is constructed as a multi-output GP with covariance and cross-covariance kernels analytically derived based on the differential equations for deflections, rotations, strains, bending moments, shear forces and applied loads. Stiffness identification is performed in a Bayesian format by maximising a posterior model through a Markov chain Monte Carlo method, yielding a stochastic model for the structural parameters. The optimised GP model is further employed for probabilistic predictions of unobserved responses. Additionally, an entropy-based method for physics-informed sensor placement optimisation is presented, exploiting heterogeneous sensor position information and structural boundary conditions built into the GP model. Results demonstrate that the proposed approach is effective at identifying structural parameters and is capable of fusing data from heterogeneous and multi-fidelity sensors. Probabilistic predictions of structural responses and internal forces are in closer agreement with measured data. We validate our model with an experimental setup and discuss the quality and uncertainty of the obtained results. The proposed approach has potential applications in the field of structural health monitoring (SHM) for both mechanical and structural systems.  ( 3 min )
    fakenewsbr: A Fake News Detection Platform for Brazilian Portuguese. (arXiv:2309.11052v2 [cs.CL] UPDATED)
    The proliferation of fake news has become a significant concern in recent times due to its potential to spread misinformation and manipulate public opinion. This paper presents a comprehensive study on detecting fake news in Brazilian Portuguese, focusing on journalistic-type news. We propose a machine learning-based approach that leverages natural language processing techniques, including TF-IDF and Word2Vec, to extract features from textual data. We evaluate the performance of various classification algorithms, such as logistic regression, support vector machine, random forest, AdaBoost, and LightGBM, on a dataset containing both true and fake news articles. The proposed approach achieves high accuracy and F1-Score, demonstrating its effectiveness in identifying fake news. Additionally, we developed a user-friendly web platform, fakenewsbr.com, to facilitate the verification of news articles' veracity. Our platform provides real-time analysis, allowing users to assess the likelihood of fake news articles. Through empirical analysis and comparative studies, we demonstrate the potential of our approach to contribute to the fight against the spread of fake news and promote more informed media consumption.  ( 3 min )
    Improving VTE Identification through Adaptive NLP Model Selection and Clinical Expert Rule-based Classifier from Radiology Reports. (arXiv:2309.12273v1 [cs.CL])
    Rapid and accurate identification of Venous thromboembolism (VTE), a severe cardiovascular condition including deep vein thrombosis (DVT) and pulmonary embolism (PE), is important for effective treatment. Leveraging Natural Language Processing (NLP) on radiology reports, automated methods have shown promising advancements in identifying VTE events from retrospective data cohorts or aiding clinical experts in identifying VTE events from radiology reports. However, effectively training Deep Learning (DL) and the NLP models is challenging due to limited labeled medical text data, the complexity and heterogeneity of radiology reports, and data imbalance. This study proposes novel method combinations of DL methods, along with data augmentation, adaptive pre-trained NLP model selection, and a clinical expert NLP rule-based classifier, to improve the accuracy of VTE identification in unstructured (free-text) radiology reports. Our experimental results demonstrate the model's efficacy, achieving an impressive 97\% accuracy and 97\% F1 score in predicting DVT, and an outstanding 98.3\% accuracy and 98.4\% F1 score in predicting PE. These findings emphasize the model's robustness and its potential to significantly contribute to VTE research.  ( 2 min )
    Face Identity-Aware Disentanglement in StyleGAN. (arXiv:2309.12033v1 [cs.CV])
    Conditional GANs are frequently used for manipulating the attributes of face images, such as expression, hairstyle, pose, or age. Even though the state-of-the-art models successfully modify the requested attributes, they simultaneously modify other important characteristics of the image, such as a person's identity. In this paper, we focus on solving this problem by introducing PluGeN4Faces, a plugin to StyleGAN, which explicitly disentangles face attributes from a person's identity. Our key idea is to perform training on images retrieved from movie frames, where a given person appears in various poses and with different attributes. By applying a type of contrastive loss, we encourage the model to group images of the same person in similar regions of latent space. Our experiments demonstrate that the modifications of face attributes performed by PluGeN4Faces are significantly less invasive on the remaining characteristics of the image than in the existing state-of-the-art models.  ( 2 min )
    Nonparametric and Regularized Dynamical Wasserstein Barycenters for Sequential Observations. (arXiv:2210.01918v3 [cs.LG] UPDATED)
    We consider probabilistic models for sequential observations which exhibit gradual transitions among a finite number of states. We are particularly motivated by applications such as human activity analysis where observed accelerometer time series contains segments representing distinct activities, which we call pure states, as well as periods characterized by continuous transition among these pure states. To capture this transitory behavior, the dynamical Wasserstein barycenter (DWB) model of Cheng et al. in 2021 [1] associates with each pure state a data-generating distribution and models the continuous transitions among these states as a Wasserstein barycenter of these distributions with dynamically evolving weights. Focusing on the univariate case where Wasserstein distances and barycenters can be computed in closed form, we extend [1] specifically relaxing the parameterization of the pure states as Gaussian distributions. We highlight issues related to the uniqueness in identifying the model parameters as well as uncertainties induced when estimating a dynamically evolving distribution from a limited number of samples. To ameliorate non-uniqueness, we introduce regularization that imposes temporal smoothness on the dynamics of the barycentric weights. A quantile-based approximation of the pure state distributions yields a finite dimensional estimation problem which we numerically solve using cyclic descent alternating between updates to the pure-state quantile functions and the barycentric weights. We demonstrate the utility of the proposed algorithm in segmenting both simulated and real world human activity time series.  ( 3 min )
    Physics-informed State-space Neural Networks for Transport Phenomena. (arXiv:2309.12211v1 [cs.LG])
    This work introduces Physics-informed State-space neural network Models (PSMs), a novel solution to achieving real-time optimization, flexibility, and fault tolerance in autonomous systems, particularly in transport-dominated systems such as chemical, biomedical, and power plants. Traditional data-driven methods fall short due to a lack of physical constraints like mass conservation; PSMs address this issue by training deep neural networks with sensor data and physics-informing using components' Partial Differential Equations (PDEs), resulting in a physics-constrained, end-to-end differentiable forward dynamics model. Through two in silico experiments - a heated channel and a cooling system loop - we demonstrate that PSMs offer a more accurate approach than purely data-driven models. Beyond accuracy, there are several compelling use cases for PSMs. In this work, we showcase two: the creation of a nonlinear supervisory controller through a sequentially updated state-space representation and the proposal of a diagnostic algorithm using residuals from each of the PDEs. The former demonstrates the ability of PSMs to handle both constant and time-dependent constraints, while the latter illustrates their value in system diagnostics and fault detection. We further posit that PSMs could serve as a foundation for Digital Twins, constantly updated digital representations of physical systems.  ( 2 min )
    Generating Hierarchical Structures for Improved Time Series Classification Using Stochastic Splitting Functions. (arXiv:2309.11963v1 [cs.LG])
    This study introduces a novel hierarchical divisive clustering approach with stochastic splitting functions (SSFs) to enhance classification performance in multi-class datasets through hierarchical classification (HC). The method has the unique capability of generating hierarchy without requiring explicit information, making it suitable for datasets lacking prior knowledge of hierarchy. By systematically dividing classes into two subsets based on their discriminability according to the classifier, the proposed approach constructs a binary tree representation of hierarchical classes. The approach is evaluated on 46 multi-class time series datasets using popular classifiers (svm and rocket) and SSFs (potr, srtr, and lsoo). The results reveal that the approach significantly improves classification performance in approximately half and a third of the datasets when using rocket and svm as the classifier, respectively. The study also explores the relationship between dataset features and HC performance. While the number of classes and flat classification (FC) score show consistent significance, variations are observed with different splitting functions. Overall, the proposed approach presents a promising strategy for enhancing classification by generating hierarchical structure in multi-class time series datasets. Future research directions involve exploring different splitting functions, classifiers, and hierarchy structures, as well as applying the approach to diverse domains beyond time series data. The source code is made openly available to facilitate reproducibility and further exploration of the method.  ( 2 min )
    Compositional Foundation Models for Hierarchical Planning. (arXiv:2309.08587v2 [cs.LG] UPDATED)
    To make effective decisions in novel environments with long-horizon goals, it is crucial to engage in hierarchical reasoning across spatial and temporal scales. This entails planning abstract subgoal sequences, visually reasoning about the underlying plans, and executing actions in accordance with the devised plan through visual-motor control. We propose Compositional Foundation Models for Hierarchical Planning (HiP), a foundation model which leverages multiple expert foundation model trained on language, vision and action data individually jointly together to solve long-horizon tasks. We use a large language model to construct symbolic plans that are grounded in the environment through a large video diffusion model. Generated video plans are then grounded to visual-motor control, through an inverse dynamics model that infers actions from generated videos. To enable effective reasoning within this hierarchy, we enforce consistency between the models via iterative refinement. We illustrate the efficacy and adaptability of our approach in three different long-horizon table-top manipulation tasks.  ( 2 min )
    SR-PredictAO: Session-based Recommendation with High-Capability Predictor Add-On. (arXiv:2309.12218v1 [cs.IR])
    Session-based recommendation, aiming at making the prediction of the user's next item click based on the information in a single session only even in the presence of some random user's behavior, is a complex problem. This complex problem requires a high-capability model of predicting the user's next action. Most (if not all) existing models follow the encoder-predictor paradigm where all studies focus on how to optimize the encoder module extensively in the paradigm but they ignore how to optimize the predictor module. In this paper, we discover the existing critical issue of the low-capability predictor module among existing models. Motivated by this, we propose a novel framework called \emph{\underline{S}ession-based \underline{R}ecommendation with \underline{Pred}ictor \underline{A}dd-\underline{O}n} (SR-PredictAO). In this framework, we propose a high-capability predictor module which could alleviate the effect of random user's behavior for prediction. It is worth mentioning that this framework could be applied to any existing models, which could give opportunities for further optimizing the framework. Extensive experiments on two real benchmark datasets for three state-of-the-art models show that \emph{SR-PredictAO} out-performs the current state-of-the-art model by up to 2.9\% in HR@20 and 2.3\% in MRR@20. More importantly, the improvement is consistent across almost all the existing models on all datasets, which could be regarded as a significant contribution in the field.  ( 2 min )
    Survey of Action Recognition, Spotting and Spatio-Temporal Localization in Soccer -- Current Trends and Research Perspectives. (arXiv:2309.12067v1 [cs.CV])
    Action scene understanding in soccer is a challenging task due to the complex and dynamic nature of the game, as well as the interactions between players. This article provides a comprehensive overview of this task divided into action recognition, spotting, and spatio-temporal action localization, with a particular emphasis on the modalities used and multimodal methods. We explore the publicly available data sources and metrics used to evaluate models' performance. The article reviews recent state-of-the-art methods that leverage deep learning techniques and traditional methods. We focus on multimodal methods, which integrate information from multiple sources, such as video and audio data, and also those that represent one source in various ways. The advantages and limitations of methods are discussed, along with their potential for improving the accuracy and robustness of models. Finally, the article highlights some of the open research questions and future directions in the field of soccer action recognition, including the potential for multimodal methods to advance this field. Overall, this survey provides a valuable resource for researchers interested in the field of action scene understanding in soccer.  ( 2 min )
    Contrastive Pseudo Learning for Open-World DeepFake Attribution. (arXiv:2309.11132v1 [cs.CV] CROSS LISTED)
    The challenge in sourcing attribution for forgery faces has gained widespread attention due to the rapid development of generative techniques. While many recent works have taken essential steps on GAN-generated faces, more threatening attacks related to identity swapping or expression transferring are still overlooked. And the forgery traces hidden in unknown attacks from the open-world unlabeled faces still remain under-explored. To push the related frontier research, we introduce a new benchmark called Open-World DeepFake Attribution (OW-DFA), which aims to evaluate attribution performance against various types of fake faces under open-world scenarios. Meanwhile, we propose a novel framework named Contrastive Pseudo Learning (CPL) for the OW-DFA task through 1) introducing a Global-Local Voting module to guide the feature alignment of forged faces with different manipulated regions, 2) designing a Confidence-based Soft Pseudo-label strategy to mitigate the pseudo-noise caused by similar methods in unlabeled set. In addition, we extend the CPL framework with a multi-stage paradigm that leverages pre-train technique and iterative learning to further enhance traceability performance. Extensive experiments verify the superiority of our proposed method on the OW-DFA and also demonstrate the interpretability of deepfake attribution task and its impact on improving the security of deepfake detection area.  ( 2 min )
    Parallelizing non-linear sequential models over the sequence length. (arXiv:2309.12252v1 [cs.LG])
    Sequential models, such as Recurrent Neural Networks and Neural Ordinary Differential Equations, have long suffered from slow training due to their inherent sequential nature. For many years this bottleneck has persisted, as many thought sequential models could not be parallelized. We challenge this long-held belief with our parallel algorithm that accelerates GPU evaluation of sequential models by up to 3 orders of magnitude faster without compromising output accuracy. The algorithm does not need any special structure in the sequential models' architecture, making it applicable to a wide range of architectures. Using our method, training sequential models can be more than 10 times faster than the common sequential method without any meaningful difference in the training results. Leveraging this accelerated training, we discovered the efficacy of the Gated Recurrent Unit in a long time series classification problem with 17k time samples. By overcoming the training bottleneck, our work serves as the first step to unlock the potential of non-linear sequential models for long sequence problems.  ( 2 min )
    Cross-scale Multi-instance Learning for Pathological Image Diagnosis. (arXiv:2304.00216v2 [eess.IV] CROSS LISTED)
    Analyzing high resolution whole slide images (WSIs) with regard to information across multiple scales poses a significant challenge in digital pathology. Multi-instance learning (MIL) is a common solution for working with high resolution images by classifying bags of objects (i.e. sets of smaller image patches). However, such processing is typically performed at a single scale (e.g., 20x magnification) of WSIs, disregarding the vital inter-scale information that is key to diagnoses by human pathologists. In this study, we propose a novel cross-scale MIL algorithm to explicitly aggregate inter-scale relationships into a single MIL network for pathological image diagnosis. The contribution of this paper is three-fold: (1) A novel cross-scale MIL (CS-MIL) algorithm that integrates the multi-scale information and the inter-scale relationships is proposed; (2) A toy dataset with scale-specific morphological features is created and released to examine and visualize differential cross-scale attention; (3) Superior performance on both in-house and public datasets is demonstrated by our simple cross-scale MIL strategy. The official implementation is publicly available at https://github.com/hrlblab/CS-MIL.  ( 2 min )
    Soft Merging: A Flexible and Robust Soft Model Merging Approach for Enhanced Neural Network Performance. (arXiv:2309.12259v1 [cs.LG])
    Stochastic Gradient Descent (SGD), a widely used optimization algorithm in deep learning, is often limited to converging to local optima due to the non-convex nature of the problem. Leveraging these local optima to improve model performance remains a challenging task. Given the inherent complexity of neural networks, the simple arithmetic averaging of the obtained local optima models in undesirable results. This paper proposes a {\em soft merging} method that facilitates rapid merging of multiple models, simplifies the merging of specific parts of neural networks, and enhances robustness against malicious models with extreme values. This is achieved by learning gate parameters through a surrogate of the $l_0$ norm using hard concrete distribution without modifying the model weights of the given local optima models. This merging process not only enhances the model performance by converging to a better local optimum, but also minimizes computational costs, offering an efficient and explicit learning process integrated with stochastic gradient descent. Thorough experiments underscore the effectiveness and superior performance of the merged neural networks.
    Adaptive Input-image Normalization for Solving Mode Collapse Problem in GAN-based X-ray Images. (arXiv:2309.12245v1 [eess.IV])
    Biomedical image datasets can be imbalanced due to the rarity of targeted diseases. Generative Adversarial Networks play a key role in addressing this imbalance by enabling the generation of synthetic images to augment datasets. It is important to generate synthetic images that incorporate a diverse range of features to accurately represent the distribution of features present in the training imagery. Furthermore, the absence of diverse features in synthetic images can degrade the performance of machine learning classifiers. The mode collapse problem impacts Generative Adversarial Networks' capacity to generate diversified images. Mode collapse comes in two varieties: intra-class and inter-class. In this paper, both varieties of the mode collapse problem are investigated, and their subsequent impact on the diversity of synthetic X-ray images is evaluated. This work contributes an empirical demonstration of the benefits of integrating the adaptive input-image normalization with the Deep Convolutional GAN and Auxiliary Classifier GAN to alleviate the mode collapse problems. Synthetically generated images are utilized for data augmentation and training a Vision Transformer model. The classification performance of the model is evaluated using accuracy, recall, and precision scores. Results demonstrate that the DCGAN and the ACGAN with adaptive input-image normalization outperform the DCGAN and ACGAN with un-normalized X-ray images as evidenced by the superior diversity scores and classification scores.
    Identifying Expert Behavior in Offline Training Datasets Improves Behavioral Cloning of Robotic Manipulation Policies. (arXiv:2301.13019v2 [cs.RO] UPDATED)
    This paper presents our solution for the Real Robot Challenge (RRC) III, a competition featured in the NeurIPS 2022 Competition Track, aimed at addressing dexterous robotic manipulation tasks through learning from pre-collected offline data. Participants were provided with two types of datasets for each task: expert and mixed datasets with varying skill levels. While the simplest offline policy learning algorithm, Behavioral Cloning (BC), performed remarkably well when trained on expert datasets, it outperformed even the most advanced offline reinforcement learning (RL) algorithms. However, BC's performance deteriorated when applied to mixed datasets, and the performance of offline RL algorithms was also unsatisfactory. Upon examining the mixed datasets, we observed that they contained a significant amount of expert data, although this data was unlabeled. To address this issue, we proposed a semi-supervised learning-based classifier to identify the underlying expert behavior within mixed datasets, effectively isolating the expert data. To further enhance BC's performance, we leveraged the geometric symmetry of the RRC arena to augment the training dataset through mathematical transformations. In the end, our submission surpassed that of all other participants, even those who employed complex offline RL algorithms and intricate data processing and feature engineering techniques.
    Empowering Precision Medicine: AI-Driven Schizophrenia Diagnosis via EEG Signals: A Comprehensive Review from 2002-2023. (arXiv:2309.12202v1 [eess.SP])
    Schizophrenia (SZ) is a prevalent mental disorder characterized by cognitive, emotional, and behavioral changes. Symptoms of SZ include hallucinations, illusions, delusions, lack of motivation, and difficulties in concentration. Diagnosing SZ involves employing various tools, including clinical interviews, physical examinations, psychological evaluations, the Diagnostic and Statistical Manual of Mental Disorders (DSM), and neuroimaging techniques. Electroencephalography (EEG) recording is a significant functional neuroimaging modality that provides valuable insights into brain function during SZ. However, EEG signal analysis poses challenges for neurologists and scientists due to the presence of artifacts, long-term recordings, and the utilization of multiple channels. To address these challenges, researchers have introduced artificial intelligence (AI) techniques, encompassing conventional machine learning (ML) and deep learning (DL) methods, to aid in SZ diagnosis. This study reviews papers focused on SZ diagnosis utilizing EEG signals and AI methods. The introduction section provides a comprehensive explanation of SZ diagnosis methods and intervention techniques. Subsequently, review papers in this field are discussed, followed by an introduction to the AI methods employed for SZ diagnosis and a summary of relevant papers presented in tabular form. Additionally, this study reports on the most significant challenges encountered in SZ diagnosis, as identified through a review of papers in this field. Future directions to overcome these challenges are also addressed. The discussion section examines the specific details of each paper, culminating in the presentation of conclusions and findings.
    Hyena Neural Operator for Partial Differential Equations. (arXiv:2306.16524v2 [cs.LG] UPDATED)
    Numerically solving partial differential equations typically requires fine discretization to resolve necessary spatiotemporal scales, which can be computationally expensive. Recent advances in deep learning have provided a new approach to solving partial differential equations that involves the use of neural operators. Neural operators are neural network architectures that learn mappings between function spaces and have the capability to solve partial differential equations based on data. This study utilizes a novel neural operator called Hyena, which employs a long convolutional filter that is parameterized by a multilayer perceptron. The Hyena operator is an operation that enjoys sub-quadratic complexity and state space model to parameterize long convolution that enjoys a global receptive field. This mechanism enhances the model's comprehension of the input's context and enables data-dependent weight for different partial differential equations instances. To measure how effective the layers are in solving partial differential equations, we conduct experiments on Diffusion-Reaction equation and Navier Stokes equation. Our findings indicate Hyena Neural operator can serve as an efficient and accurate model for learning partial differential equations solution operator. The data and code used can be found at: https://github.com/Saupatil07/Hyena-Neural-Operator  ( 2 min )
    ALI-DPFL: Differentially Private Federated Learning with Adaptive Local Iterations. (arXiv:2308.10457v2 [cs.LG] UPDATED)
    Federated Learning (FL) is a distributed machine learning technique that allows model training among multiple devices or organizations by sharing training parameters instead of raw data. However, adversaries can still infer individual information through inference attacks (e.g. differential attacks) on these training parameters. As a result, Differential Privacy (DP) has been widely used in FL to prevent such attacks. We consider differentially private federated learning in a resource-constrained scenario, where both privacy budget and communication round are constrained. By theoretically analyzing the convergence, we can find the optimal number of differentially private local iterations for clients between any two sequential global updates. Based on this, we design an algorithm of differentially private federated learning with adaptive local iterations (ALI-DPFL). We experiment our algorithm on the FashionMNIST and CIFAR10 datasets, and demonstrate significantly better performances than previous work in the resource-constraint scenario.
    Weakly supervised learning for pattern classification in serial femtosecond crystallography. (arXiv:2309.04474v2 [cond-mat.mtrl-sci] UPDATED)
    Serial femtosecond crystallography at X-ray free electron laser facilities opens a new era for the determination of crystal structure. However, the data processing of those experiments is facing unprecedented challenge, because the total number of diffraction patterns needed to determinate a high-resolution structure is huge. Machine learning methods are very likely to play important roles in dealing with such a large volume of data. Convolutional neural networks have made a great success in the field of pattern classification, however, training of the networks need very large datasets with labels. Th is heavy dependence on labeled datasets will seriously restrict the application of networks, because it is very costly to annotate a large number of diffraction patterns. In this article we present our job on the classification of diffraction pattern by weakly supervised algorithms, with the aim of reducing as much as possible the size of the labeled dataset required for training. Our result shows that weakly supervised methods can significantly reduce the need for the number of labeled patterns while achieving comparable accuracy to fully supervised methods.
    Federated Learning for Medical Applications: A Taxonomy, Current Trends, Challenges, and Future Research Directions. (arXiv:2208.03392v4 [cs.LG] UPDATED)
    With the advent of the IoT, AI and ML/DL algorithms, the landscape of data-driven medical applications has emerged as a promising avenue for designing robust and scalable diagnostic and prognostic models from medical data. Consequently, the realm of data-driven medical applications has garnered significant attention spanning academia and industry, ushering in marked enhancements in healthcare delivery quality. Despite these strides, the adoption of AI-driven medical applications remains hindered by formidable challenges, including the arduous task of meeting security, privacy, and quality of service (QoS) standards. Recent developments in federated learning have made it possible to train complex machine-learned models in a distributed manner and has become an active research domain, particularly processing the medical data at the edge of the network in a decentralized way to preserve privacy and address security concerns. To this end, this survey paper highlights the current and future of FL technology in medical applications where data sharing is a significant burden. We delve into the contemporary research trends and their outcomes, unravelling the intricacies of designing reliable and scalable FL models. Our survey outlines the foundational statistical predicaments of FL, confronts device-related obstacles, delves into security challenges, and navigates the intricate terrain of privacy concerns, all while spotlighting its transformative potential within the medical domain. A primary focus of our study rests on medical applications, where we underscore the weighty burden of global cancer and illuminate the potency of FL in engendering computer-aided diagnosis tools that address this challenge with heightened efficacy.
    Drifter: Efficient Online Feature Monitoring for Improved Data Integrity in Large-Scale Recommendation Systems. (arXiv:2309.08617v2 [cs.IR] UPDATED)
    Real-world production systems often grapple with maintaining data quality in large-scale, dynamic streams. We introduce Drifter, an efficient and lightweight system for online feature monitoring and verification in recommendation use cases. Drifter addresses limitations of existing methods by delivering agile, responsive, and adaptable data quality monitoring, enabling real-time root cause analysis, drift detection and insights into problematic production events. Integrating state-of-the-art online feature ranking for sparse data and anomaly detection ideas, Drifter is highly scalable and resource-efficient, requiring only two threads and less than a gigabyte of RAM per production deployments that handle millions of instances per minute. Evaluation on real-world data sets demonstrates Drifter's effectiveness in alerting and mitigating data quality issues, substantially improving reliability and performance of real-time live recommender systems.
    Improving Behavioural Cloning with Positive Unlabeled Learning. (arXiv:2301.11734v2 [cs.LG] UPDATED)
    Learning control policies offline from pre-recorded datasets is a promising avenue for solving challenging real-world problems. However, available datasets are typically of mixed quality, with a limited number of the trajectories that we would consider as positive examples; i.e., high-quality demonstrations. Therefore, we propose a novel iterative learning algorithm for identifying expert trajectories in unlabeled mixed-quality robotics datasets given a minimal set of positive examples, surpassing existing algorithms in terms of accuracy. We show that applying behavioral cloning to the resulting filtered dataset outperforms several competitive offline reinforcement learning and imitation learning baselines. We perform experiments on a range of simulated locomotion tasks and on two challenging manipulation tasks on a real robotic system; in these experiments, our method showcases state-of-the-art performance. Our website: \url{https://sites.google.com/view/offline-policy-learning-pubc}.
    Variational Connectionist Temporal Classification for Order-Preserving Sequence Modeling. (arXiv:2309.11983v1 [cs.LG])
    Connectionist temporal classification (CTC) is commonly adopted for sequence modeling tasks like speech recognition, where it is necessary to preserve order between the input and target sequences. However, CTC is only applied to deterministic sequence models, where the latent space is discontinuous and sparse, which in turn makes them less capable of handling data variability when compared to variational models. In this paper, we integrate CTC with a variational model and derive loss functions that can be used to train more generalizable sequence models that preserve order. Specifically, we derive two versions of the novel variational CTC based on two reasonable assumptions, the first being that the variational latent variables at each time step are conditionally independent; and the second being that these latent variables are Markovian. We show that both loss functions allow direct optimization of the variational lower bound for the model log-likelihood, and present computationally tractable forms for implementing them.
    PrNet: A Neural Network for Correcting Pseudoranges to Improve Positioning with Android Raw GNSS Measurements. (arXiv:2309.12204v1 [cs.LG])
    We present a neural network for mitigating pseudorange bias to improve localization performance with data collected from Android smartphones. We represent pseudorange bias using a pragmatic satellite-wise Multiple Layer Perceptron (MLP), the inputs of which are six satellite-receiver-context-related features derived from Android raw Global Navigation Satellite System (GNSS) measurements. To supervise the training process, we carefully calculate the target values of pseudorange bias using location ground truth and smoothing techniques and optimize a loss function containing the estimation residuals of smartphone clock bias. During the inference process, we employ model-based localization engines to compute locations with pseudoranges corrected by the neural network. Consequently, this hybrid pipeline can attend to both pseudorange bias and noise. We evaluate the framework on an open dataset and consider four application scenarios for investigating fingerprinting and cross-trace localization in rural and urban areas. Extensive experiments demonstrate that the proposed framework outperforms model-based and state-of-the-art data-driven approaches.
    Model-based Deep Learning for High-Dimensional Periodic Structures. (arXiv:2309.12223v1 [eess.SP])
    This work presents a deep learning surrogate model for the fast simulation of high-dimensional frequency selective surfaces. We consider unit-cells which are built as multiple concatenated stacks of screens and their design requires the control over many geometrical degrees of freedom. Thanks to the introduction of physical insight into the model, it can produce accurate predictions of the S-parameters of a certain structure after training with a reduced dataset.The proposed model is highly versatile and it can be used with any kind of frequency selective surface, based on either perforations or patches of any arbitrary geometry. Numeric examples are presented here for the case of frequency selective surfaces composed of screens with rectangular perforations, showing an excellent agreement between the predicted performance and such obtained with a full-wave simulator.
    Domain-knowledge Inspired Pseudo Supervision (DIPS) for Unsupervised Image-to-Image Translation Models to Support Cross-Domain Classification. (arXiv:2303.10310v3 [cs.CV] UPDATED)
    The ability to classify images is dependent on having access to large labeled datasets and testing on data from the same domain that the model can train on. Classification becomes more challenging when dealing with new data from a different domain, where gathering and especially labeling a larger image dataset for retraining a classification model requires a labor-intensive human effort. Cross-domain classification frameworks were developed to handle this data domain shift problem by utilizing unsupervised image-to-image translation models to translate an input image from the unlabeled domain to the labeled domain. The problem with these unsupervised models lies in their unsupervised nature. For lack of annotations, it is not possible to use the traditional supervised metrics to evaluate these translation models to pick the best-saved checkpoint model. This paper introduces a new method called Domain-knowledge Inspired Pseudo Supervision (DIPS) which utilizes domain-informed Gaussian Mixture Models to generate pseudo annotations to enable the use of traditional supervised metrics. This method was designed specifically to support cross-domain classification applications contrary to other typically used metrics such as the FID which were designed to evaluate the model in terms of the quality of the generated image from a human-eye perspective. DIPS proves its effectiveness by outperforming various GAN evaluation metrics, including FID, when selecting the optimal saved checkpoint model. It is also evaluated against truly supervised metrics. Furthermore, DIPS showcases its robustness and interpretability by demonstrating a strong correlation with truly supervised metrics, highlighting its superiority over existing state-of-the-art alternatives. The code and data to replicate the results can be found on the official Github repository: https://github.com/Hindawi91/DIPS
    DREAM: A Dynamic Scheduler for Dynamic Real-time Multi-model ML Workloads. (arXiv:2212.03414v2 [cs.DC] UPDATED)
    Emerging real-time multi-model ML (RTMM) workloads such as AR/VR and drone control involve dynamic behaviors in various granularity; task, model, and layers within a model. Such dynamic behaviors introduce new challenges to the system software in an ML system since the overall system load is not completely predictable, unlike traditional ML workloads. In addition, RTMM workloads require real-time processing, involve highly heterogeneous models, and target resource-constrained devices. Under such circumstances, developing an effective scheduler gains more importance to better utilize underlying hardware considering the unique characteristics of RTMM workloads. Therefore, we propose a new scheduler, DREAM, which effectively handles various dynamicity in RTMM workloads targeting multi-accelerator systems. DREAM quantifies the unique requirements for RTMM workloads and utilizes the quantified scores to drive scheduling decisions, considering the current system load and other inference jobs on different models and input frames. DREAM utilizes tunable parameters that provide fast and effective adaptivity to dynamic workload changes. In our evaluation of five scenarios of RTMM workload, DREAM reduces the overall UXCost, which is an equivalent metric of the energy-delay product (EDP) for RTMM defined in the paper, by 32.2% and 50.0% in the geometric mean (up to 80.8% and 97.6%) compared to state-of-the-art baselines, which shows the efficacy of our scheduling methodology.
    SALSA-CLRS: A Sparse and Scalable Benchmark for Algorithmic Reasoning. (arXiv:2309.12253v1 [cs.LG])
    We introduce an extension to the CLRS algorithmic learning benchmark, prioritizing scalability and the utilization of sparse representations. Many algorithms in CLRS require global memory or information exchange, mirrored in its execution model, which constructs fully connected (not sparse) graphs based on the underlying problem. Despite CLRS's aim of assessing how effectively learned algorithms can generalize to larger instances, the existing execution model becomes a significant constraint due to its demanding memory requirements and runtime (hard to scale). However, many important algorithms do not demand a fully connected graph; these algorithms, primarily distributed in nature, align closely with the message-passing paradigm employed by Graph Neural Networks. Hence, we propose SALSA-CLRS, an extension of the current CLRS benchmark specifically with scalability and sparseness in mind. Our approach includes adapted algorithms from the original CLRS benchmark and introduces new problems from distributed and randomized algorithms. Moreover, we perform a thorough empirical evaluation of our benchmark. Code is publicly available at https://github.com/jkminder/SALSA-CLRS.
    Class-wise Classifier Design Capable of Continual Learning using Adaptive Resonance Theory-based Topological Clustering. (arXiv:2203.09879v2 [cs.LG] UPDATED)
    This paper proposes a supervised classification algorithm capable of continual learning by utilizing an Adaptive Resonance Theory (ART)-based growing self-organizing clustering algorithm. The ART-based clustering algorithm is theoretically capable of continual learning, and the proposed algorithm independently applies it to each class of training data for generating classifiers. Whenever an additional training data set from a new class is given, a new ART-based clustering will be defined in a different learning space. Thanks to the above-mentioned features, the proposed algorithm realizes continual learning capability. Simulation experiments showed that the proposed algorithm has superior classification performance compared with state-of-the-art clustering-based classification algorithms capable of continual learning.
    Decision-making and control with diffractive optical networks. (arXiv:2212.11278v3 [cs.LG] UPDATED)
    The ultimate goal of artificial intelligence is to mimic the human brain to perform decision-making and control directly from high-dimensional sensory input. Diffractive optical networks provide a promising solution for implementing artificial intelligence with high-speed and low-power consumption. Most of the reported diffractive optical networks focus on single or multiple tasks that do not involve environmental interaction, such as object recognition and image classification. In contrast, the networks capable of performing decision-making and control have not yet been developed to our knowledge. Here, we propose using deep reinforcement learning to implement diffractive optical networks that imitate human-level decision-making and control capability. Such networks taking advantage of a residual architecture, allow for finding optimal control policies through interaction with the environment and can be readily implemented with existing optical devices. The superior performance of these networks is verified by engaging three types of classic games, Tic-Tac-Toe, Super Mario Bros., and Car Racing. Finally, we present an experimental demonstration of playing Tic-Tac-Toe by leveraging diffractive optical networks based on a spatial light modulator. Our work represents a solid step forward in advancing diffractive optical networks, which promises a fundamental shift from the target-driven control of a pre-designed state for simple recognition or classification tasks to the high-level sensory capability of artificial intelligence. It may find exciting applications in autonomous driving, intelligent robots, and intelligent manufacturing.
    Persistent Homology of the Multiscale Clustering Filtration. (arXiv:2305.04281v2 [math.AT] UPDATED)
    In many applications in data clustering, it is desirable to find not just a single partition into clusters but a sequence of partitions describing the data at different scales, or levels of coarseness. A natural problem then is to analyse and compare the (not necessarily hierarchical) sequences of partitions that underpin such multiscale descriptions of data. Here, we introduce a filtration of abstract simplicial complexes, denoted the Multiscale Clustering Filtration (MCF), which encodes arbitrary patterns of cluster assignments across scales, and we prove that the MCF produces stable persistence diagrams. We then show that the zero-dimensional persistent homology of the MCF measures the degree of hierarchy in the sequence of partitions, and that the higher-dimensional persistent homology tracks the emergence and resolution of conflicts between cluster assignments across the sequence of partitions. To broaden the theoretical foundations of the MCF, we also provide an equivalent construction via a nerve complex filtration, and we show that in the hierarchical case, the MCF reduces to a Vietoris-Rips filtration of an ultrametric space. We briefly illustrate how the MCF can serve to characterise multiscale clustering structures in numerical experiments on synthetic data.
    Self-supervised learning unveils change in urban housing from street-level images. (arXiv:2309.11354v2 [cs.CV] UPDATED)
    Cities around the world face a critical shortage of affordable and decent housing. Despite its critical importance for policy, our ability to effectively monitor and track progress in urban housing is limited. Deep learning-based computer vision methods applied to street-level images have been successful in the measurement of socioeconomic and environmental inequalities but did not fully utilize temporal images to track urban change as time-varying labels are often unavailable. We used self-supervised methods to measure change in London using 15 million street images taken between 2008 and 2021. Our novel adaptation of Barlow Twins, Street2Vec, embeds urban structure while being invariant to seasonal and daily changes without manual annotations. It outperformed generic embeddings, successfully identified point-level change in London's housing supply from street-level images, and distinguished between major and minor change. This capability can provide timely information for urban planning and policy decisions toward more liveable, equitable, and sustainable cities.
    Environment-biased Feature Ranking for Novelty Detection Robustness. (arXiv:2309.12301v1 [cs.LG])
    We tackle the problem of robust novelty detection, where we aim to detect novelties in terms of semantic content while being invariant to changes in other, irrelevant factors. Specifically, we operate in a setup with multiple environments, where we determine the set of features that are associated more with the environments, rather than to the content relevant for the task. Thus, we propose a method that starts with a pretrained embedding and a multi-env setup and manages to rank the features based on their environment-focus. First, we compute a per-feature score based on the feature distribution variance between envs. Next, we show that by dropping the highly scored ones, we manage to remove spurious correlations and improve the overall performance by up to 6%, both in covariance and sub-population shift cases, both for a real and a synthetic benchmark, that we introduce for this task.
    CoDi: Co-evolving Contrastive Diffusion Models for Mixed-type Tabular Synthesis. (arXiv:2304.12654v2 [cs.LG] UPDATED)
    With growing attention to tabular data these days, the attempt to apply a synthetic table to various tasks has been expanded toward various scenarios. Owing to the recent advances in generative modeling, fake data generated by tabular data synthesis models become sophisticated and realistic. However, there still exists a difficulty in modeling discrete variables (columns) of tabular data. In this work, we propose to process continuous and discrete variables separately (but being conditioned on each other) by two diffusion models. The two diffusion models are co-evolved during training by reading conditions from each other. In order to further bind the diffusion models, moreover, we introduce a contrastive learning method with a negative sampling method. In our experiments with 11 real-world tabular datasets and 8 baseline methods, we prove the efficacy of the proposed method, called CoDi.
    On the different regimes of Stochastic Gradient Descent. (arXiv:2309.10688v2 [cs.LG] UPDATED)
    Modern deep networks are trained with stochastic gradient descent (SGD) whose key parameters are the number of data considered at each step or batch size $B$, and the step size or learning rate $\eta$. For small $B$ and large $\eta$, SGD corresponds to a stochastic evolution of the parameters, whose noise amplitude is governed by the `temperature' $T\equiv \eta/B$. Yet this description is observed to break down for sufficiently large batches $B\geq B^*$, or simplifies to gradient descent (GD) when the temperature is sufficiently small. Understanding where these cross-overs take place remains a central challenge. Here we resolve these questions for a teacher-student perceptron classification model, and show empirically that our key predictions still apply to deep networks. Specifically, we obtain a phase diagram in the $B$-$\eta$ plane that separates three dynamical phases: $\textit{(i)}$ a noise-dominated SGD governed by temperature, $\textit{(ii)}$ a large-first-step-dominated SGD and $\textit{(iii)}$ GD. These different phases also corresponds to different regimes of generalization error. Remarkably, our analysis reveals that the batch size $B^*$ separating regimes $\textit{(i)}$ and $\textit{(ii)}$ scale with the size $P$ of the training set, with an exponent that characterizes the hardness of the classification problem.
    Analysis and Comparison of Classification Metrics. (arXiv:2209.05355v4 [cs.LG] UPDATED)
    A variety of different performance metrics are commonly used in the machine learning literature for the evaluation of classification systems. Some of the most common ones for measuring quality of hard decisions are standard and balanced accuracy, standard and balanced error rate, F-beta score, and Matthews correlation coefficient (MCC). In this document, we review the definition of these and other metrics and compare them with the expected cost (EC), a metric introduced in every statistical learning course but rarely used in the machine learning literature. We show that both the standard and balanced error rates are special cases of the EC. Further, we show its relation with F-beta score and MCC and argue that EC is superior to these traditional metrics for being based on first principles from statistics, and for being more general, interpretable, and adaptable to any application scenario. The metrics mentioned above measure the quality of hard decisions. Yet, most modern classification systems output continuous scores for the classes which we may want to evaluate directly. Metrics for measuring the quality of system scores include the area under the ROC curve, equal error rate, cross-entropy, Brier score, and Bayes EC or Bayes risk, among others. The last three metrics are special cases of a family of metrics given by the expected value of proper scoring rules (PSRs). We review the theory behind these metrics, showing that they are a principled way to measure the quality of the posterior probabilities produced by a system. Finally, we show how to use these metrics to compute a system's calibration loss and compare this metric with the widely-used expected calibration error (ECE), arguing that calibration loss based on PSRs is superior to the ECE for being more interpretable, more general, and directly applicable to the multi-class case, among other reasons.
    Bridging the Gap: Learning Pace Synchronization for Open-World Semi-Supervised Learning. (arXiv:2309.11930v1 [cs.LG])
    In open-world semi-supervised learning, a machine learning model is tasked with uncovering novel categories from unlabeled data while maintaining performance on seen categories from labeled data. The central challenge is the substantial learning gap between seen and novel categories, as the model learns the former faster due to accurate supervisory information. To address this, we introduce 1) an adaptive margin loss based on estimated class distribution, which encourages a large negative margin for samples in seen classes, to synchronize learning paces, and 2) pseudo-label contrastive clustering, which pulls together samples which are likely from the same class in the output space, to enhance novel class discovery. Our extensive evaluations on multiple datasets demonstrate that existing models still hinder novel class learning, whereas our approach strikingly balances both seen and novel classes, achieving a remarkable 3% average accuracy increase on the ImageNet dataset compared to the prior state-of-the-art. Additionally, we find that fine-tuning the self-supervised pre-trained backbone significantly boosts performance over the default in prior literature. After our paper is accepted, we will release the code.
    Convergence and Recovery Guarantees of Unsupervised Neural Networks for Inverse Problems. (arXiv:2309.12128v1 [cs.LG])
    Neural networks have become a prominent approach to solve inverse problems in recent years. While a plethora of such methods was developed to solve inverse problems empirically, we are still lacking clear theoretical guarantees for these methods. On the other hand, many works proved convergence to optimal solutions of neural networks in a more general setting using overparametrization as a way to control the Neural Tangent Kernel. In this work we investigate how to bridge these two worlds and we provide deterministic convergence and recovery guarantees for the class of unsupervised feedforward multilayer neural networks trained to solve inverse problems. We also derive overparametrization bounds under which a two-layers Deep Inverse Prior network with smooth activation function will benefit from our guarantees.
    Robust Approximation Algorithms for Non-monotone $k$-Submodular Maximization under a Knapsack Constraint. (arXiv:2309.12025v1 [cs.DS])
    The problem of non-monotone $k$-submodular maximization under a knapsack constraint ($\kSMK$) over the ground set size $n$ has been raised in many applications in machine learning, such as data summarization, information propagation, etc. However, existing algorithms for the problem are facing questioning of how to overcome the non-monotone case and how to fast return a good solution in case of the big size of data. This paper introduces two deterministic approximation algorithms for the problem that competitively improve the query complexity of existing algorithms. Our first algorithm, $\LAA$, returns an approximation ratio of $1/19$ within $O(nk)$ query complexity. The second one, $\RLA$, improves the approximation ratio to $1/5-\epsilon$ in $O(nk)$ queries, where $\epsilon$ is an input parameter. Our algorithms are the first ones that provide constant approximation ratios within only $O(nk)$ query complexity for the non-monotone objective. They, therefore, need fewer the number of queries than state-of-the-the-art ones by a factor of $\Omega(\log n)$. Besides the theoretical analysis, we have evaluated our proposed ones with several experiments in some instances: Influence Maximization and Sensor Placement for the problem. The results confirm that our algorithms ensure theoretical quality as the cutting-edge techniques and significantly reduce the number of queries.
    PIE: Simulating Disease Progression via Progressive Image Editing. (arXiv:2309.11745v1 [eess.IV])
    Disease progression simulation is a crucial area of research that has significant implications for clinical diagnosis, prognosis, and treatment. One major challenge in this field is the lack of continuous medical imaging monitoring of individual patients over time. To address this issue, we develop a novel framework termed Progressive Image Editing (PIE) that enables controlled manipulation of disease-related image features, facilitating precise and realistic disease progression simulation. Specifically, we leverage recent advancements in text-to-image generative models to simulate disease progression accurately and personalize it for each patient. We theoretically analyze the iterative refining process in our framework as a gradient descent with an exponentially decayed learning rate. To validate our framework, we conduct experiments in three medical imaging domains. Our results demonstrate the superiority of PIE over existing methods such as Stable Diffusion Walk and Style-Based Manifold Extrapolation based on CLIP score (Realism) and Disease Classification Confidence (Alignment). Our user study collected feedback from 35 veteran physicians to assess the generated progressions. Remarkably, 76.2% of the feedback agrees with the fidelity of the generated progressions. To our best knowledge, PIE is the first of its kind to generate disease progression images meeting real-world standards. It is a promising tool for medical research and clinical practice, potentially allowing healthcare providers to model disease trajectories over time, predict future treatment responses, and improve patient outcomes.
    Federated Learning with Neural Graphical Models. (arXiv:2309.11680v1 [cs.LG])
    Federated Learning (FL) addresses the need to create models based on proprietary data in such a way that multiple clients retain exclusive control over their data, while all benefit from improved model accuracy due to pooled resources. Recently proposed Neural Graphical Models (NGMs) are Probabilistic Graphical models that utilize the expressive power of neural networks to learn complex non-linear dependencies between the input features. They learn to capture the underlying data distribution and have efficient algorithms for inference and sampling. We develop a FL framework which maintains a global NGM model that learns the averaged information from the local NGM models while keeping the training data within the client's environment. Our design, FedNGMs, avoids the pitfalls and shortcomings of neuron matching frameworks like Federated Matched Averaging that suffers from model parameter explosion. Our global model size remains constant throughout the process. In the cases where clients have local variables that are not part of the combined global distribution, we propose a `Stitching' algorithm, which personalizes the global NGM models by merging the additional variables using the client's data. FedNGM is robust to data heterogeneity, large number of participants, and limited communication bandwidth.
    Stock Market Sentiment Classification and Backtesting via Fine-tuned BERT. (arXiv:2309.11979v1 [q-fin.CP])
    With the rapid development of big data and computing devices, low-latency automatic trading platforms based on real-time information acquisition have become the main components of the stock trading market, so the topic of quantitative trading has received widespread attention. And for non-strongly efficient trading markets, human emotions and expectations always dominate market trends and trading decisions. Therefore, this paper starts from the theory of emotion, taking East Money as an example, crawling user comment titles data from its corresponding stock bar and performing data cleaning. Subsequently, a natural language processing model BERT was constructed, and the BERT model was fine-tuned using existing annotated data sets. The experimental results show that the fine-tuned model has different degrees of performance improvement compared to the original model and the baseline model. Subsequently, based on the above model, the user comment data crawled is labeled with emotional polarity, and the obtained label information is combined with the Alpha191 model to participate in regression, and significant regression results are obtained. Subsequently, the regression model is used to predict the average price change for the next five days, and use it as a signal to guide automatic trading. The experimental results show that the incorporation of emotional factors increased the return rate by 73.8\% compared to the baseline during the trading period, and by 32.41\% compared to the original alpha191 model. Finally, we discuss the advantages and disadvantages of incorporating emotional factors into quantitative trading, and give possible directions for further research in the future.
    Safe Hierarchical Reinforcement Learning for CubeSat Task Scheduling Based on Energy Consumption. (arXiv:2309.12004v1 [cs.LG])
    This paper presents a Hierarchical Reinforcement Learning methodology tailored for optimizing CubeSat task scheduling in Low Earth Orbits (LEO). Incorporating a high-level policy for global task distribution and a low-level policy for real-time adaptations as a safety mechanism, our approach integrates the Similarity Attention-based Encoder (SABE) for task prioritization and an MLP estimator for energy consumption forecasting. Integrating this mechanism creates a safe and fault-tolerant system for CubeSat task scheduling. Simulation results validate the Hierarchical Reinforcement Learning superior convergence and task success rate, outperforming both the MADDPG model and traditional random scheduling across multiple CubeSat configurations.  ( 2 min )
    Predictability and Comprehensibility in Post-Hoc XAI Methods: A User-Centered Analysis. (arXiv:2309.11987v1 [cs.LG])
    Post-hoc explainability methods aim to clarify predictions of black-box machine learning models. However, it is still largely unclear how well users comprehend the provided explanations and whether these increase the users ability to predict the model behavior. We approach this question by conducting a user study to evaluate comprehensibility and predictability in two widely used tools: LIME and SHAP. Moreover, we investigate the effect of counterfactual explanations and misclassifications on users ability to understand and predict the model behavior. We find that the comprehensibility of SHAP is significantly reduced when explanations are provided for samples near a model's decision boundary. Furthermore, we find that counterfactual explanations and misclassifications can significantly increase the users understanding of how a machine learning model is making decisions. Based on our findings, we also derive design recommendations for future post-hoc explainability methods with increased comprehensibility and predictability.
    SupeRBNN: Randomized Binary Neural Network Using Adiabatic Superconductor Josephson Devices. (arXiv:2309.12212v1 [cs.ET])
    Adiabatic Quantum-Flux-Parametron (AQFP) is a superconducting logic with extremely high energy efficiency. By employing the distinct polarity of current to denote logic `0' and `1', AQFP devices serve as excellent carriers for binary neural network (BNN) computations. Although recent research has made initial strides toward developing an AQFP-based BNN accelerator, several critical challenges remain, preventing the design from being a comprehensive solution. In this paper, we propose SupeRBNN, an AQFP-based randomized BNN acceleration framework that leverages software-hardware co-optimization to eventually make the AQFP devices a feasible solution for BNN acceleration. Specifically, we investigate the randomized behavior of the AQFP devices and analyze the impact of crossbar size on current attenuation, subsequently formulating the current amplitude into the values suitable for use in BNN computation. To tackle the accumulation problem and improve overall hardware performance, we propose a stochastic computing-based accumulation module and a clocking scheme adjustment-based circuit optimization method. We validate our SupeRBNN framework across various datasets and network architectures, comparing it with implementations based on different technologies, including CMOS, ReRAM, and superconducting RSFQ/ERSFQ. Experimental results demonstrate that our design achieves an energy efficiency of approximately 7.8x10^4 times higher than that of the ReRAM-based BNN framework while maintaining a similar level of model accuracy. Furthermore, when compared with superconductor-based counterparts, our framework demonstrates at least two orders of magnitude higher energy efficiency.  ( 3 min )
    BTLM-3B-8K: 7B Parameter Performance in a 3B Parameter Model. (arXiv:2309.11568v1 [cs.AI])
    We introduce the Bittensor Language Model, called "BTLM-3B-8K", a new state-of-the-art 3 billion parameter open-source language model. BTLM-3B-8K was trained on 627B tokens from the SlimPajama dataset with a mixture of 2,048 and 8,192 context lengths. BTLM-3B-8K outperforms all existing 3B parameter models by 2-5.5% across downstream tasks. BTLM-3B-8K is even competitive with some 7B parameter models. Additionally, BTLM-3B-8K provides excellent long context performance, outperforming MPT-7B-8K and XGen-7B-8K on tasks up to 8,192 context length. We trained the model on a cleaned and deduplicated SlimPajama dataset; aggressively tuned the \textmu P hyperparameters and schedule; used ALiBi position embeddings; and adopted the SwiGLU nonlinearity. On Hugging Face, the most popular models have 7B parameters, indicating that users prefer the quality-size ratio of 7B models. Compacting the 7B parameter model to one with 3B parameters, with little performance impact, is an important milestone. BTLM-3B-8K needs only 3GB of memory with 4-bit precision and takes 2.5x less inference compute than 7B models, helping to open up access to a powerful language model on mobile and edge devices. BTLM-3B-8K is available under an Apache 2.0 license on Hugging Face: https://huggingface.co/cerebras/btlm-3b-8k-base.
    AdBooster: Personalized Ad Creative Generation using Stable Diffusion Outpainting. (arXiv:2309.11507v1 [cs.IR])
    In digital advertising, the selection of the optimal item (recommendation) and its best creative presentation (creative optimization) have traditionally been considered separate disciplines. However, both contribute significantly to user satisfaction, underpinning our assumption that it relies on both an item's relevance and its presentation, particularly in the case of visual creatives. In response, we introduce the task of {\itshape Generative Creative Optimization (GCO)}, which proposes the use of generative models for creative generation that incorporate user interests, and {\itshape AdBooster}, a model for personalized ad creatives based on the Stable Diffusion outpainting architecture. This model uniquely incorporates user interests both during fine-tuning and at generation time. To further improve AdBooster's performance, we also introduce an automated data augmentation pipeline. Through our experiments on simulated data, we validate AdBooster's effectiveness in generating more relevant creatives than default product images, showing its potential of enhancing user engagement.
    State Augmented Constrained Reinforcement Learning: Overcoming the Limitations of Learning with Rewards. (arXiv:2102.11941v2 [cs.LG] UPDATED)
    A common formulation of constrained reinforcement learning involves multiple rewards that must individually accumulate to given thresholds. In this class of problems, we show a simple example in which the desired optimal policy cannot be induced by any weighted linear combination of rewards. Hence, there exist constrained reinforcement learning problems for which neither regularized nor classical primal-dual methods yield optimal policies. This work addresses this shortcoming by augmenting the state with Lagrange multipliers and reinterpreting primal-dual methods as the portion of the dynamics that drives the multipliers evolution. This approach provides a systematic state augmentation procedure that is guaranteed to solve reinforcement learning problems with constraints. Thus, as we illustrate by an example, while previous methods can fail at finding optimal policies, running the dual dynamics while executing the augmented policy yields an algorithm that provably samples actions from the optimal policy.  ( 2 min )
    Proportional Response: Contextual Bandits for Simple and Cumulative Regret Minimization. (arXiv:2307.02108v2 [cs.LG] UPDATED)
    Simple regret minimization is a critical problem in learning optimal treatment assignment policies across various domains, including healthcare and e-commerce. However, it remains understudied in the contextual bandit setting. We propose a new family of computationally efficient bandit algorithms for the stochastic contextual bandit settings, with the flexibility to be adapted for cumulative regret minimization (with near-optimal minimax guarantees) and simple regret minimization (with SOTA guarantees). Furthermore, our algorithms adapt to model misspecification and extend to the continuous arm settings. These advantages come from constructing and relying on "conformal arm sets" (CASs), which provide a set of arms at every context that encompass the context-specific optimal arm with some probability across the context distribution. Our positive results on simple and cumulative regret guarantees are contrasted by a negative result, which shows that an algorithm can't achieve instance-dependent simple regret guarantees while simultaneously achieving minimax optimal cumulative regret guarantees.
    SAM-OCTA: A Fine-Tuning Strategy for Applying Foundation Model to OCTA Image Segmentation Tasks. (arXiv:2309.11758v1 [cs.CV])
    In the analysis of optical coherence tomography angiography (OCTA) images, the operation of segmenting specific targets is necessary. Existing methods typically train on supervised datasets with limited samples (approximately a few hundred), which can lead to overfitting. To address this, the low-rank adaptation technique is adopted for foundation model fine-tuning and proposed corresponding prompt point generation strategies to process various segmentation tasks on OCTA datasets. This method is named SAM-OCTA and has been experimented on the publicly available OCTA-500 dataset. While achieving state-of-the-art performance metrics, this method accomplishes local vessel segmentation as well as effective artery-vein segmentation, which was not well-solved in previous works. The code is available at: https://github.com/ShellRedia/SAM-OCTA.
    Graph Neural Networks for the Offline Nanosatellite Task Scheduling Problem. (arXiv:2303.13773v2 [cs.LG] UPDATED)
    This study investigates how to schedule nanosatellite tasks more efficiently using Graph Neural Networks (GNNs). In the Offline Nanosatellite Task Scheduling (ONTS) problem, the goal is to find the optimal schedule for tasks to be carried out in orbit while taking into account Quality-of-Service (QoS) considerations such as priority, minimum and maximum activation events, execution time-frames, periods, and execution windows, as well as constraints on the satellite's power resources and the complexity of energy harvesting and management. The ONTS problem has been approached using conventional mathematical formulations and exact methods, but their applicability to challenging cases of the problem is limited. This study examines the use of GNNs in this context, which has been effectively applied to optimization problems such as the traveling salesman, scheduling, and facility placement problems. More specifically, we investigate whether GNNs can learn the complex structure of the ONTS problem with respect to feasibility and optimality of candidate solutions. Furthermore, we evaluate using GNN-based heuristic solutions to provide better solutions (w.r.t. the objective value) to the ONTS problem and reduce the optimization cost. Our experiments show that GNNs are not only able to learn feasibility and optimality for instances of the ONTS problem, but they can generalize to harder instances than those seen during training. Furthermore, the GNN-based heuristics improved the expected objective value of the best solution found under the time limit in 45%, and reduced the expected time to find a feasible solution in 35%, when compared to the SCIP (Solving Constraint Integer Programs) solver in its off-the-shelf configuration
    Crowdotic: A Privacy-Preserving Hospital Waiting Room Crowd Density Estimation with Non-speech Audio. (arXiv:2309.10280v2 [cs.SD] UPDATED)
    Privacy-preserving crowd density analysis finds application across a wide range of scenarios, substantially enhancing smart building operation and management while upholding privacy expectations in various spaces. We propose a non-speech audio-based approach for crowd analytics, leveraging a transformer-based model. Our results demonstrate that non-speech audio alone can be used to conduct such analysis with remarkable accuracy. To the best of our knowledge, this is the first time when non-speech audio signals are proposed for predicting occupancy. As far as we know, there has been no other similar approach of its kind prior to this. To accomplish this, we deployed our sensor-based platform in the waiting room of a large hospital with IRB approval over a period of several months to capture non-speech audio and thermal images for the training and evaluation of our models. The proposed non-speech-based approach outperformed the thermal camera-based model and all other baselines. In addition to demonstrating superior performance without utilizing speech audio, we conduct further analysis using differential privacy techniques to provide additional privacy guarantees. Overall, our work demonstrates the viability of employing non-speech audio data for accurate occupancy estimation, while also ensuring the exclusion of speech-related content and providing robust privacy protections through differential privacy guarantees.
    $\lambda$-AC: Learning latent decision-aware models for reinforcement learning in continuous state-spaces. (arXiv:2306.17366v2 [cs.LG] UPDATED)
    The idea of decision-aware model learning, that models should be accurate where it matters for decision-making, has gained prominence in model-based reinforcement learning. While promising theoretical results have been established, the empirical performance of algorithms leveraging a decision-aware loss has been lacking, especially in continuous control problems. In this paper, we present a study on the necessary components for decision-aware reinforcement learning models and we showcase design choices that enable well-performing algorithms. To this end, we provide a theoretical and empirical investigation into prominent algorithmic ideas in the field. We highlight that empirical design decisions established in the MuZero line of works are vital to achieving good performance for related algorithms, and we showcase differences in behavior between different instantiations of value-aware algorithms in stochastic environments. Using these insights, we propose the Latent Model-Based Decision-Aware Actor-Critic framework ($\lambda$-AC) for decision-aware model-based reinforcement learning in continuous state-spaces and highlight important design choices in different environments.
    Optimal Conditional Inference in Adaptive Experiments. (arXiv:2309.12162v1 [stat.ME])
    We study batched bandit experiments and consider the problem of inference conditional on the realized stopping time, assignment probabilities, and target parameter, where all of these may be chosen adaptively using information up to the last batch of the experiment. Absent further restrictions on the experiment, we show that inference using only the results of the last batch is optimal. When the adaptive aspects of the experiment are known to be location-invariant, in the sense that they are unchanged when we shift all batch-arm means by a constant, we show that there is additional information in the data, captured by one additional linear function of the batch-arm means. In the more restrictive case where the stopping time, assignment probabilities, and target parameter are known to depend on the data only through a collection of polyhedral events, we derive computationally tractable and optimal conditional inference procedures.  ( 2 min )
    t-EER: Parameter-Free Tandem Evaluation of Countermeasures and Biometric Comparators. (arXiv:2309.12237v1 [cs.CR])
    Presentation attack (spoofing) detection (PAD) typically operates alongside biometric verification to improve reliablity in the face of spoofing attacks. Even though the two sub-systems operate in tandem to solve the single task of reliable biometric verification, they address different detection tasks and are hence typically evaluated separately. Evidence shows that this approach is suboptimal. We introduce a new metric for the joint evaluation of PAD solutions operating in situ with biometric verification. In contrast to the tandem detection cost function proposed recently, the new tandem equal error rate (t-EER) is parameter free. The combination of two classifiers nonetheless leads to a \emph{set} of operating points at which false alarm and miss rates are equal and also dependent upon the prevalence of attacks. We therefore introduce the \emph{concurrent} t-EER, a unique operating point which is invariable to the prevalence of attacks. Using both modality (and even application) agnostic simulated scores, as well as real scores for a voice biometrics application, we demonstrate application of the t-EER to a wide range of biometric system evaluations under attack. The proposed approach is a strong candidate metric for the tandem evaluation of PAD systems and biometric comparators.
    ForceSight: Text-Guided Mobile Manipulation with Visual-Force Goals. (arXiv:2309.12312v1 [cs.RO])
    We present ForceSight, a system for text-guided mobile manipulation that predicts visual-force goals using a deep neural network. Given a single RGBD image combined with a text prompt, ForceSight determines a target end-effector pose in the camera frame (kinematic goal) and the associated forces (force goal). Together, these two components form a visual-force goal. Prior work has demonstrated that deep models outputting human-interpretable kinematic goals can enable dexterous manipulation by real robots. Forces are critical to manipulation, yet have typically been relegated to lower-level execution in these systems. When deployed on a mobile manipulator equipped with an eye-in-hand RGBD camera, ForceSight performed tasks such as precision grasps, drawer opening, and object handovers with an 81% success rate in unseen environments with object instances that differed significantly from the training data. In a separate experiment, relying exclusively on visual servoing and ignoring force goals dropped the success rate from 90% to 45%, demonstrating that force goals can significantly enhance performance. The appendix, videos, code, and trained models are available at https://force-sight.github.io/.
    A Multi-label Classification Approach to Increase Expressivity of EMG-based Gesture Recognition. (arXiv:2309.12217v1 [eess.SP])
    Objective: The objective of the study is to efficiently increase the expressivity of surface electromyography-based (sEMG) gesture recognition systems. Approach: We use a problem transformation approach, in which actions were subset into two biomechanically independent components - a set of wrist directions and a set of finger modifiers. To maintain fast calibration time, we train models for each component using only individual gestures, and extrapolate to the full product space of combination gestures by generating synthetic data. We collected a supervised dataset with high-confidence ground truth labels in which subjects performed combination gestures while holding a joystick, and conducted experiments to analyze the impact of model architectures, classifier algorithms, and synthetic data generation strategies on the performance of the proposed approach. Main Results: We found that a problem transformation approach using a parallel model architecture in combination with a non-linear classifier, along with restricted synthetic data generation, shows promise in increasing the expressivity of sEMG-based gestures with a short calibration time. Significance: sEMG-based gesture recognition has applications in human-computer interaction, virtual reality, and the control of robotic and prosthetic devices. Existing approaches require exhaustive model calibration. The proposed approach increases expressivity without requiring users to demonstrate all combination gesture classes. Our results may be extended to larger gesture vocabularies and more complicated model architectures.  ( 2 min )
    Smooth ECE: Principled Reliability Diagrams via Kernel Smoothing. (arXiv:2309.12236v1 [cs.LG])
    Calibration measures and reliability diagrams are two fundamental tools for measuring and interpreting the calibration of probabilistic predictors. Calibration measures quantify the degree of miscalibration, and reliability diagrams visualize the structure of this miscalibration. However, the most common constructions of reliability diagrams and calibration measures -- binning and ECE -- both suffer from well-known flaws (e.g. discontinuity). We show that a simple modification fixes both constructions: first smooth the observations using an RBF kernel, then compute the Expected Calibration Error (ECE) of this smoothed function. We prove that with a careful choice of bandwidth, this method yields a calibration measure that is well-behaved in the sense of (B{\l}asiok, Gopalan, Hu, and Nakkiran 2023a) -- a consistent calibration measure. We call this measure the SmoothECE. Moreover, the reliability diagram obtained from this smoothed function visually encodes the SmoothECE, just as binned reliability diagrams encode the BinnedECE. We also provide a Python package with simple, hyperparameter-free methods for measuring and plotting calibration: `pip install relplot\`.
    Uplift vs. predictive modeling: a theoretical analysis. (arXiv:2309.12036v1 [cs.LG])
    Despite the growing popularity of machine-learning techniques in decision-making, the added value of causal-oriented strategies with respect to pure machine-learning approaches has rarely been quantified in the literature. These strategies are crucial for practitioners in various domains, such as marketing, telecommunications, health care and finance. This paper presents a comprehensive treatment of the subject, starting from firm theoretical foundations and highlighting the parameters that influence the performance of the uplift and predictive approaches. The focus of the paper is on a binary outcome case and a binary action, and the paper presents a theoretical analysis of uplift modeling, comparing it with the classical predictive approach. The main research contributions of the paper include a new formulation of the measure of profit, a formal proof of the convergence of the uplift curve to the measure of profit ,and an illustration, through simulations, of the conditions under which predictive approaches still outperform uplift modeling. We show that the mutual information between the features and the outcome plays a significant role, along with the variance of the estimators, the distribution of the potential outcomes and the underlying costs and benefits of the treatment and the outcome.  ( 2 min )
    Enabling Quartile-based Estimated-Mean Gradient Aggregation As Baseline for Federated Image Classifications. (arXiv:2309.12267v1 [cs.CR])
    Federated Learning (FL) has revolutionized how we train deep neural networks by enabling decentralized collaboration while safeguarding sensitive data and improving model performance. However, FL faces two crucial challenges: the diverse nature of data held by individual clients and the vulnerability of the FL system to security breaches. This paper introduces an innovative solution named Estimated Mean Aggregation (EMA) that not only addresses these challenges but also provides a fundamental reference point as a $\mathsf{baseline}$ for advanced aggregation techniques in FL systems. EMA's significance lies in its dual role: enhancing model security by effectively handling malicious outliers through trimmed means and uncovering data heterogeneity to ensure that trained models are adaptable across various client datasets. Through a wealth of experiments, EMA consistently demonstrates high accuracy and area under the curve (AUC) compared to alternative methods, establishing itself as a robust baseline for evaluating the effectiveness and security of FL aggregation methods. EMA's contributions thus offer a crucial step forward in advancing the efficiency, security, and versatility of decentralized deep learning in the context of FL.  ( 2 min )
    LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models. (arXiv:2309.12307v1 [cs.CL])
    We present LongLoRA, an efficient fine-tuning approach that extends the context sizes of pre-trained large language models (LLMs), with limited computation cost. Typically, training LLMs with long context sizes is computationally expensive, requiring extensive training hours and GPU resources. For example, training on the context length of 8192 needs 16x computational costs in self-attention layers as that of 2048. In this paper, we speed up the context extension of LLMs in two aspects. On the one hand, although dense global attention is needed during inference, fine-tuning the model can be effectively and efficiently done by sparse local attention. The proposed shift short attention effectively enables context extension, leading to non-trivial computation saving with similar performance to fine-tuning with vanilla attention. Particularly, it can be implemented with only two lines of code in training, while being optional in inference. On the other hand, we revisit the parameter-efficient fine-tuning regime for context expansion. Notably, we find that LoRA for context extension works well under the premise of trainable embedding and normalization. LongLoRA demonstrates strong empirical results on various tasks on LLaMA2 models from 7B/13B to 70B. LongLoRA adopts LLaMA2 7B from 4k context to 100k, or LLaMA2 70B to 32k on a single 8x A100 machine. LongLoRA extends models' context while retaining their original architectures, and is compatible with most existing techniques, like FlashAttention-2. In addition, to make LongLoRA practical, we collect a dataset, LongQA, for supervised fine-tuning. It contains more than 3k long context question-answer pairs.  ( 3 min )
    Brain Tumor Detection Using Deep Learning Approaches. (arXiv:2309.12193v1 [eess.IV])
    Brain tumors are collections of abnormal cells that can develop into masses or clusters. Because they have the potential to infiltrate other tissues, they pose a risk to the patient. The main imaging technique used, MRI, may be able to identify a brain tumor with accuracy. The fast development of Deep Learning methods for use in computer vision applications has been facilitated by a vast amount of training data and improvements in model construction that offer better approximations in a supervised setting. The need for these approaches has been the main driver of this expansion. Deep learning methods have shown promise in improving the precision of brain tumor detection and classification using magnetic resonance imaging (MRI). The study on the use of deep learning techniques, especially ResNet50, for brain tumor identification is presented in this abstract. As a result, this study investigates the possibility of automating the detection procedure using deep learning techniques. In this study, I utilized five transfer learning models which are VGG16, VGG19, DenseNet121, ResNet50 and YOLO V4 where ResNet50 provide the best or highest accuracy 99.54%. The goal of the study is to guide researchers and medical professionals toward powerful brain tumor detecting systems by employing deep learning approaches by way of this evaluation and analysis.  ( 2 min )
    Boolformer: Symbolic Regression of Logic Functions with Transformers. (arXiv:2309.12207v1 [cs.LG])
    In this work, we introduce Boolformer, the first Transformer architecture trained to perform end-to-end symbolic regression of Boolean functions. First, we show that it can predict compact formulas for complex functions which were not seen during training, when provided a clean truth table. Then, we demonstrate its ability to find approximate expressions when provided incomplete and noisy observations. We evaluate the Boolformer on a broad set of real-world binary classification datasets, demonstrating its potential as an interpretable alternative to classic machine learning methods. Finally, we apply it to the widespread task of modelling the dynamics of gene regulatory networks. Using a recent benchmark, we show that Boolformer is competitive with state-of-the art genetic algorithms with a speedup of several orders of magnitude. Our code and models are available publicly.  ( 2 min )
    A Survey on Transformers in Reinforcement Learning. (arXiv:2301.03044v3 [cs.LG] UPDATED)
    Transformer has been considered the dominating neural architecture in NLP and CV, mostly under supervised settings. Recently, a similar surge of using Transformers has appeared in the domain of reinforcement learning (RL), but it is faced with unique design choices and challenges brought by the nature of RL. However, the evolution of Transformers in RL has not yet been well unraveled. In this paper, we seek to systematically review motivations and progress on using Transformers in RL, provide a taxonomy on existing works, discuss each sub-field, and summarize future prospects.  ( 2 min )
    Learning End-to-End Channel Coding with Diffusion Models. (arXiv:2309.10505v2 [cs.IT] UPDATED)
    The training of neural encoders via deep learning necessitates a differentiable channel model due to the backpropagation algorithm. This requirement can be sidestepped by approximating either the channel distribution or its gradient through pilot signals in real-world scenarios. The initial approach draws upon the latest advancements in image generation, utilizing generative adversarial networks (GANs) or their enhanced variants to generate channel distributions. In this paper, we address this channel approximation challenge with diffusion models, which have demonstrated high sample quality in image generation. We offer an end-to-end channel coding framework underpinned by diffusion models and propose an efficient training algorithm. Our simulations with various channel models establish that our diffusion models learn the channel distribution accurately, thereby achieving near-optimal end-to-end symbol error rates (SERs). We also note a significant advantage of diffusion models: A robust generalization capability in high signal-to-noise ratio regions, in contrast to GAN variants that suffer from error floor. Furthermore, we examine the trade-off between sample quality and sampling speed, when an accelerated sampling algorithm is deployed, and investigate the effect of the noise scheduling on this trade-off. With an apt choice of noise scheduling, sampling time can be significantly reduced with a minor increase in SER.  ( 2 min )
    Semantic-aware Transmission Scheduling: a Monotonicity-driven Deep Reinforcement Learning Approach. (arXiv:2305.13706v2 [cs.LG] UPDATED)
    For cyber-physical systems in the 6G era, semantic communications connecting distributed devices for dynamic control and remote state estimation are required to guarantee application-level performance, not merely focus on communication-centric performance. Semantics here is a measure of the usefulness of information transmissions. Semantic-aware transmission scheduling of a large system often involves a large decision-making space, and the optimal policy cannot be obtained by existing algorithms effectively. In this paper, we first investigate the fundamental properties of the optimal semantic-aware scheduling policy and then develop advanced deep reinforcement learning (DRL) algorithms by leveraging the theoretical guidelines. Our numerical results show that the proposed algorithms can substantially reduce training time and enhance training performance compared to benchmark algorithms.
    Early diagnosis of autism spectrum disorder using machine learning approaches. (arXiv:2309.11646v1 [cs.LG])
    Autistic Spectrum Disorder (ASD) is a neurological disease characterized by difficulties with social interaction, communication, and repetitive activities. The severity of these difficulties varies, and those with this diagnosis face unique challenges. While its primary origin lies in genetics, identifying and addressing it early can contribute to the enhancement of the condition. In recent years, machine learning-driven intelligent diagnosis has emerged as a supplement to conventional clinical approaches, aiming to address the potential drawbacks of time-consuming and costly traditional methods. In this work, we utilize different machine learning algorithms to find the most significant traits responsible for ASD and to automate the diagnostic process. We study six classification models to see which model works best to identify ASD and also study five popular clustering methods to get a meaningful insight of these ASD datasets. To find the best classifier for these binary datasets, we evaluate the models using accuracy, precision, recall, specificity, F1-score, AUC, kappa and log loss metrics. Our evaluation demonstrates that five out of the six selected models perform exceptionally, achieving a 100% accuracy rate on the ASD datasets when hyperparameters are meticulously tuned for each model. As almost all classification models are able to get 100% accuracy, we become interested in observing the underlying insights of the datasets by implementing some popular clustering algorithms on these datasets. We calculate Normalized Mutual Information (NMI), Adjusted Rand Index (ARI) & Silhouette Coefficient (SC) metrics to select the best clustering models. Our evaluation finds that spectral clustering outperforms all other benchmarking clustering models in terms of NMI & ARI metrics and it also demonstrates comparability to the optimal SC achieved by k-means.
    Grassmann Manifold Flows for Stable Shape Generation. (arXiv:2211.02900v2 [cs.LG] UPDATED)
    Recently, studies on machine learning have focused on methods that use symmetry implicit in a specific manifold as an inductive bias. Grassmann manifolds provide the ability to handle fundamental shapes represented as shape spaces, enabling stable shape analysis. In this paper, we present a novel approach in which we establish the theoretical foundations for learning distributions on the Grassmann manifold via continuous normalization flows, with the explicit goal of generating stable shapes. Our approach facilitates more robust generation by effectively eliminating the influence of extraneous transformations, such as rotations and inversions, through learning and generating within a Grassmann manifolds designed to accommodate the essential shape information of the object. The experimental results indicated that the proposed method can generate high-quality samples by capturing the data structure. Furthermore, the proposed method significantly outperformed state-of-the-art methods in terms of the log-likelihood or evidence lower bound. The results obtained are expected to stimulate further research in this field, leading to advances for stable shape generation and analysis.
    Dynamic Hypergraph Structure Learning for Traffic Flow Forecasting. (arXiv:2309.12028v1 [cs.LG])
    This paper studies the problem of traffic flow forecasting, which aims to predict future traffic conditions on the basis of road networks and traffic conditions in the past. The problem is typically solved by modeling complex spatio-temporal correlations in traffic data using spatio-temporal graph neural networks (GNNs). However, the performance of these methods is still far from satisfactory since GNNs usually have limited representation capacity when it comes to complex traffic networks. Graphs, by nature, fall short in capturing non-pairwise relations. Even worse, existing methods follow the paradigm of message passing that aggregates neighborhood information linearly, which fails to capture complicated spatio-temporal high-order interactions. To tackle these issues, in this paper, we propose a novel model named Dynamic Hypergraph Structure Learning (DyHSL) for traffic flow prediction. To learn non-pairwise relationships, our DyHSL extracts hypergraph structural information to model dynamics in the traffic networks, and updates each node representation by aggregating messages from its associated hyperedges. Additionally, to capture high-order spatio-temporal relations in the road network, we introduce an interactive graph convolution block, which further models the neighborhood interaction for each node. Finally, we integrate these two views into a holistic multi-scale correlation extraction module, which conducts temporal pooling with different scales to model different temporal patterns. Extensive experiments on four popular traffic benchmark datasets demonstrate the effectiveness of our proposed DyHSL compared with a broad range of competing baselines.  ( 3 min )
    Self-Supervised Contrastive Learning for Robust Audio-Sheet Music Retrieval Systems. (arXiv:2309.12134v1 [cs.SD])
    Linking sheet music images to audio recordings remains a key problem for the development of efficient cross-modal music retrieval systems. One of the fundamental approaches toward this task is to learn a cross-modal embedding space via deep neural networks that is able to connect short snippets of audio and sheet music. However, the scarcity of annotated data from real musical content affects the capability of such methods to generalize to real retrieval scenarios. In this work, we investigate whether we can mitigate this limitation with self-supervised contrastive learning, by exposing a network to a large amount of real music data as a pre-training step, by contrasting randomly augmented views of snippets of both modalities, namely audio and sheet images. Through a number of experiments on synthetic and real piano data, we show that pre-trained models are able to retrieve snippets with better precision in all scenarios and pre-training configurations. Encouraged by these results, we employ the snippet embeddings in the higher-level task of cross-modal piece identification and conduct more experiments on several retrieval configurations. In this task, we observe that the retrieval quality improves from 30% up to 100% when real music data is present. We then conclude by arguing for the potential of self-supervised contrastive learning for alleviating the annotated data scarcity in multi-modal music retrieval models.
    Unsupervised Domain Adaptation for Self-Driving from Past Traversal Features. (arXiv:2309.12140v1 [cs.CV])
    The rapid development of 3D object detection systems for self-driving cars has significantly improved accuracy. However, these systems struggle to generalize across diverse driving environments, which can lead to safety-critical failures in detecting traffic participants. To address this, we propose a method that utilizes unlabeled repeated traversals of multiple locations to adapt object detectors to new driving environments. By incorporating statistics computed from repeated LiDAR scans, we guide the adaptation process effectively. Our approach enhances LiDAR-based detection models using spatial quantized historical features and introduces a lightweight regression head to leverage the statistics for feature regularization. Additionally, we leverage the statistics for a novel self-training process to stabilize the training. The framework is detector model-agnostic and experiments on real-world datasets demonstrate significant improvements, achieving up to a 20-point performance gain, especially in detecting pedestrians and distant objects. Code is available at https://github.com/zhangtravis/Hist-DA.
    Potential and limitations of random Fourier features for dequantizing quantum machine learning. (arXiv:2309.11647v1 [quant-ph])
    Quantum machine learning is arguably one of the most explored applications of near-term quantum devices. Much focus has been put on notions of variational quantum machine learning where parameterized quantum circuits (PQCs) are used as learning models. These PQC models have a rich structure which suggests that they might be amenable to efficient dequantization via random Fourier features (RFF). In this work, we establish necessary and sufficient conditions under which RFF does indeed provide an efficient dequantization of variational quantum machine learning for regression. We build on these insights to make concrete suggestions for PQC architecture design, and to identify structures which are necessary for a regression problem to admit a potential quantum advantage via PQC based optimization.
    TMac: Temporal Multi-Modal Graph Learning for Acoustic Event Classification. (arXiv:2309.11845v1 [cs.SD])
    Audiovisual data is everywhere in this digital age, which raises higher requirements for the deep learning models developed on them. To well handle the information of the multi-modal data is the key to a better audiovisual modal. We observe that these audiovisual data naturally have temporal attributes, such as the time information for each frame in the video. More concretely, such data is inherently multi-modal according to both audio and visual cues, which proceed in a strict chronological order. It indicates that temporal information is important in multi-modal acoustic event modeling for both intra- and inter-modal. However, existing methods deal with each modal feature independently and simply fuse them together, which neglects the mining of temporal relation and thus leads to sub-optimal performance. With this motivation, we propose a Temporal Multi-modal graph learning method for Acoustic event Classification, called TMac, by modeling such temporal information via graph learning techniques. In particular, we construct a temporal graph for each acoustic event, dividing its audio data and video data into multiple segments. Each segment can be considered as a node, and the temporal relationships between nodes can be considered as timestamps on their edges. In this case, we can smoothly capture the dynamic information in intra-modal and inter-modal. Several experiments are conducted to demonstrate TMac outperforms other SOTA models in performance. Our code is available at https://github.com/MGitHubL/TMac.
    An Efficient Consolidation of Word Embedding and Deep Learning Techniques for Classifying Anticancer Peptides: FastText+BiLSTM. (arXiv:2309.12058v1 [cs.LG])
    Anticancer peptides (ACPs) are a group of peptides that exhibite antineoplastic properties. The utilization of ACPs in cancer prevention can present a viable substitute for conventional cancer therapeutics, as they possess a higher degree of selectivity and safety. Recent scientific advancements generate an interest in peptide-based therapies which offer the advantage of efficiently treating intended cells without negatively impacting normal cells. However, as the number of peptide sequences continues to increase rapidly, developing a reliable and precise prediction model becomes a challenging task. In this work, our motivation is to advance an efficient model for categorizing anticancer peptides employing the consolidation of word embedding and deep learning models. First, Word2Vec and FastText are evaluated as word embedding techniques for the purpose of extracting peptide sequences. Then, the output of word embedding models are fed into deep learning approaches CNN, LSTM, BiLSTM. To demonstrate the contribution of proposed framework, extensive experiments are carried on widely-used datasets in the literature, ACPs250 and Independent. Experiment results show the usage of proposed model enhances classification accuracy when compared to the state-of-the-art studies. The proposed combination, FastText+BiLSTM, exhibits 92.50% of accuracy for ACPs250 dataset, and 96.15% of accuracy for Independent dataset, thence determining new state-of-the-art.
    Ensuring Topological Data-Structure Preservation under Autoencoder Compression due to Latent Space Regularization in Gauss--Legendre nodes. (arXiv:2309.08228v2 [cs.LG] UPDATED)
    We formulate a data independent latent space regularisation constraint for general unsupervised autoencoders. The regularisation rests on sampling the autoencoder Jacobian in Legendre nodes, being the centre of the Gauss-Legendre quadrature. Revisiting this classic enables to prove that regularised autoencoders ensure a one-to-one re-embedding of the initial data manifold to its latent representation. Demonstrations show that prior proposed regularisation strategies, such as contractive autoencoding, cause topological defects already for simple examples, and so do convolutional based (variational) autoencoders. In contrast, topological preservation is ensured already by standard multilayer perceptron neural networks when being regularised due to our contribution. This observation extends through the classic FashionMNIST dataset up to real world encoding problems for MRI brain scans, suggesting that, across disciplines, reliable low dimensional representations of complex high-dimensional datasets can be delivered due to this regularisation technique.  ( 2 min )
    A Variational Auto-Encoder Enabled Multi-Band Channel Prediction Scheme for Indoor Localization. (arXiv:2309.12200v1 [eess.SP])
    Indoor localization is getting increasing demands for various cutting-edged technologies, like Virtual/Augmented reality and smart home. Traditional model-based localization suffers from significant computational overhead, so fingerprint localization is getting increasing attention, which needs lower computation cost after the fingerprint database is built. However, the accuracy of indoor localization is limited by the complicated indoor environment which brings the multipath signal refraction. In this paper, we provided a scheme to improve the accuracy of indoor fingerprint localization from the frequency domain by predicting the channel state information (CSI) values from another transmitting channel and spliced the multi-band information together to get more precise localization results. We tested our proposed scheme on COST 2100 simulation data and real time orthogonal frequency division multiplexing (OFDM) WiFi data collected from an office scenario.  ( 2 min )
    Learning to Drive Anywhere. (arXiv:2309.12295v1 [cs.CV])
    Human drivers can seamlessly adapt their driving decisions across geographical locations with diverse conditions and rules of the road, e.g., left vs. right-hand traffic. In contrast, existing models for autonomous driving have been thus far only deployed within restricted operational domains, i.e., without accounting for varying driving behaviors across locations or model scalability. In this work, we propose AnyD, a single geographically-aware conditional imitation learning (CIL) model that can efficiently learn from heterogeneous and globally distributed data with dynamic environmental, traffic, and social characteristics. Our key insight is to introduce a high-capacity geo-location-based channel attention mechanism that effectively adapts to local nuances while also flexibly modeling similarities among regions in a data-driven manner. By optimizing a contrastive imitation objective, our proposed approach can efficiently scale across inherently imbalanced data distributions and location-dependent events. We demonstrate the benefits of our AnyD agent across multiple datasets, cities, and scalable deployment paradigms, i.e., centralized, semi-supervised, and distributed agent training. Specifically, AnyD outperforms CIL baselines by over 14% in open-loop evaluation and 30% in closed-loop testing on CARLA.  ( 2 min )
    Bayesian sparsification for deep neural networks with Bayesian model reduction. (arXiv:2309.12095v1 [stat.ML])
    Deep learning's immense capabilities are often constrained by the complexity of its models, leading to an increasing demand for effective sparsification techniques. Bayesian sparsification for deep learning emerges as a crucial approach, facilitating the design of models that are both computationally efficient and competitive in terms of performance across various deep learning applications. The state-of-the-art -- in Bayesian sparsification of deep neural networks -- combines structural shrinkage priors on model weights with an approximate inference scheme based on black-box stochastic variational inference. However, model inversion of the full generative model is exceptionally computationally demanding, especially when compared to standard deep learning of point estimates. In this context, we advocate for the use of Bayesian model reduction (BMR) as a more efficient alternative for pruning of model weights. As a generalization of the Savage-Dickey ratio, BMR allows a post-hoc elimination of redundant model weights based on the posterior estimates under a straightforward (non-hierarchical) generative model. Our comparative study highlights the computational efficiency and the pruning rate of the BMR method relative to the established stochastic variational inference (SVI) scheme, when applied to the full hierarchical generative model. We illustrate the potential of BMR to prune model parameters across various deep learning architectures, from classical networks like LeNet to modern frameworks such as Vision Transformers and MLP-Mixers.
    Online Self-Concordant and Relatively Smooth Minimization, With Applications to Online Portfolio Selection and Learning Quantum States. (arXiv:2210.00997v3 [stat.ML] UPDATED)
    Consider an online convex optimization problem where the loss functions are self-concordant barriers, smooth relative to a convex function $h$, and possibly non-Lipschitz. We analyze the regret of online mirror descent with $h$. Then, based on the result, we prove the following in a unified manner. Denote by $T$ the time horizon and $d$ the parameter dimension. 1. For online portfolio selection, the regret of $\widetilde{\text{EG}}$, a variant of exponentiated gradient due to Helmbold et al., is $\tilde{O} ( T^{2/3} d^{1/3} )$ when $T > 4 d / \log d$. This improves on the original $\tilde{O} ( T^{3/4} d^{1/2} )$ regret bound for $\widetilde{\text{EG}}$. 2. For online portfolio selection, the regret of online mirror descent with the logarithmic barrier is $\tilde{O}(\sqrt{T d})$. The regret bound is the same as that of Soft-Bayes due to Orseau et al. up to logarithmic terms. 3. For online learning quantum states with the logarithmic loss, the regret of online mirror descent with the log-determinant function is also $\tilde{O} ( \sqrt{T d} )$. Its per-iteration time is shorter than all existing algorithms we know.  ( 3 min )
    FedGKD: Unleashing the Power of Collaboration in Federated Graph Neural Networks. (arXiv:2309.09517v3 [cs.LG] UPDATED)
    Federated training of Graph Neural Networks (GNN) has become popular in recent years due to its ability to perform graph-related tasks under data isolation scenarios while preserving data privacy. However, graph heterogeneity issues in federated GNN systems continue to pose challenges. Existing frameworks address the problem by representing local tasks using different statistics and relating them through a simple aggregation mechanism. However, these approaches suffer from limited efficiency from two aspects: low quality of task-relatedness quantification and inefficacy of exploiting the collaboration structure. To address these issues, we propose FedGKD, a novel federated GNN framework that utilizes a novel client-side graph dataset distillation method to extract task features that better describe task-relatedness, and introduces a novel server-side aggregation mechanism that is aware of the global collaboration structure. We conduct extensive experiments on six real-world datasets of different scales, demonstrating our framework's outperformance.
    Enhancing SAEAs with Unevaluated Solutions: A Case Study of Relation Model for Expensive Optimization. (arXiv:2309.11994v1 [cs.NE])
    Surrogate-assisted evolutionary algorithms (SAEAs) hold significant importance in resolving expensive optimization problems~(EOPs). Extensive efforts have been devoted to improving the efficacy of SAEAs through the development of proficient model-assisted selection methods. However, generating high-quality solutions is a prerequisite for selection. The fundamental paradigm of evaluating a limited number of solutions in each generation within SAEAs reduces the variance of adjacent populations, thus impacting the quality of offspring solutions. This is a frequently encountered issue, yet it has not gained widespread attention. This paper presents a framework using unevaluated solutions to enhance the efficiency of SAEAs. The surrogate model is employed to identify high-quality solutions for direct generation of new solutions without evaluation. To ensure dependable selection, we have introduced two tailored relation models for the selection of the optimal solution and the unevaluated population. A comprehensive experimental analysis is performed on two test suites, which showcases the superiority of the relation model over regression and classification models in the selection phase. Furthermore, the surrogate-selected unevaluated solutions with high potential have been shown to significantly enhance the efficiency of the algorithm.
    SQUARE: Automatic Question Answering Evaluation using Multiple Positive and Negative References. (arXiv:2309.12250v1 [cs.CL])
    Evaluation of QA systems is very challenging and expensive, with the most reliable approach being human annotations of correctness of answers for questions. Recent works (AVA, BEM) have shown that transformer LM encoder based similarity metrics transfer well for QA evaluation, but they are limited by the usage of a single correct reference answer. We propose a new evaluation metric: SQuArE (Sentence-level QUestion AnsweRing Evaluation), using multiple reference answers (combining multiple correct and incorrect references) for sentence-form QA. We evaluate SQuArE on both sentence-level extractive (Answer Selection) and generative (GenQA) QA systems, across multiple academic and industrial datasets, and show that it outperforms previous baselines and obtains the highest correlation with human annotations.
    ZeroFlow: Fast, Zero Label, Scalable Scene Flow via Distillation. (arXiv:2305.10424v5 [cs.CV] UPDATED)
    Scene flow estimation is the task of describing the 3D motion field between temporally successive point clouds. State-of-the-art methods use strong priors and test-time optimization techniques, but require on the order of tens of seconds to process large-scale point clouds, making them unusable as computer vision primitives for real-time applications such as open world object detection. Feed forward methods are considerably faster, running on the order of tens to hundreds of milliseconds for large-scale point clouds, but require expensive human supervision. To address both limitations, we propose Scene Flow via Distillation, a simple, scalable distillation framework that uses a label-free optimization method to produce pseudo-labels to supervise a feed forward model. Our instantiation of this framework, ZeroFlow, achieves state-of-the-art performance on the Argoverse 2 Self-Supervised Scene Flow Challenge while using zero human labels by simply training on large-scale, diverse unlabeled data. At test-time, ZeroFlow is over 1000$\times$ faster than label-free state-of-the-art optimization-based methods on large-scale point clouds and over 1000$\times$ cheaper to train on unlabeled data compared to the cost of human annotation of that data. To facilitate further research, we will release our code, trained model weights, and high quality pseudo-labels for the Argoverse 2 and Waymo Open datasets.
    Fast Adaptation with Bradley-Terry Preference Models in Text-To-Image Classification and Generation. (arXiv:2308.07929v2 [cs.CV] UPDATED)
    Recently, large multimodal models, such as CLIP and Stable Diffusion have experimented tremendous successes in both foundations and applications. However, as these models increase in parameter size and computational requirements, it becomes more challenging for users to personalize them for specific tasks or preferences. In this work, we address the problem of adapting the previous models towards sets of particular human preferences, aligning the retrieved or generated images with the preferences of the user. We leverage the Bradley-Terry preference model to develop a fast adaptation method that efficiently fine-tunes the original model, with few examples and with minimal computing resources. Extensive evidence of the capabilities of this framework is provided through experiments in different domains related to multimodal text and image understanding, including preference prediction as a reward model, and generation tasks.
    See to Touch: Learning Tactile Dexterity through Visual Incentives. (arXiv:2309.12300v1 [cs.RO])
    Equipping multi-fingered robots with tactile sensing is crucial for achieving the precise, contact-rich, and dexterous manipulation that humans excel at. However, relying solely on tactile sensing fails to provide adequate cues for reasoning about objects' spatial configurations, limiting the ability to correct errors and adapt to changing situations. In this paper, we present Tactile Adaptation from Visual Incentives (TAVI), a new framework that enhances tactile-based dexterity by optimizing dexterous policies using vision-based rewards. First, we use a contrastive-based objective to learn visual representations. Next, we construct a reward function using these visual representations through optimal-transport based matching on one human demonstration. Finally, we use online reinforcement learning on our robot to optimize tactile-based policies that maximize the visual reward. On six challenging tasks, such as peg pick-and-place, unstacking bowls, and flipping slender objects, TAVI achieves a success rate of 73% using our four-fingered Allegro robot hand. The increase in performance is 108% higher than policies using tactile and vision-based rewards and 135% higher than policies without tactile observational input. Robot videos are best viewed on our project website: https://see-to-touch.github.io/.
    Performance Conditioning for Diffusion-Based Multi-Instrument Music Synthesis. (arXiv:2309.12283v1 [cs.SD])
    Generating multi-instrument music from symbolic music representations is an important task in Music Information Retrieval (MIR). A central but still largely unsolved problem in this context is musically and acoustically informed control in the generation process. As the main contribution of this work, we propose enhancing control of multi-instrument synthesis by conditioning a generative model on a specific performance and recording environment, thus allowing for better guidance of timbre and style. Building on state-of-the-art diffusion-based music generative models, we introduce performance conditioning - a simple tool indicating the generative model to synthesize music with style and timbre of specific instruments taken from specific performances. Our prototype is evaluated using uncurated performances with diverse instrumentation and achieves state-of-the-art FAD realism scores while allowing novel timbre and style control. Our project page, including samples and demonstrations, is available at benadar293.github.io/midipm
    Subsampling Suffices for Adaptive Data Analysis. (arXiv:2302.08661v2 [cs.LG] UPDATED)
    Ensuring that analyses performed on a dataset are representative of the entire population is one of the central problems in statistics. Most classical techniques assume that the dataset is independent of the analyst's query and break down in the common setting where a dataset is reused for multiple, adaptively chosen, queries. This problem of \emph{adaptive data analysis} was formalized in the seminal works of Dwork et al. (STOC, 2015) and Hardt and Ullman (FOCS, 2014). We identify a remarkably simple set of assumptions under which the queries will continue to be representative even when chosen adaptively: The only requirements are that each query takes as input a random subsample and outputs few bits. This result shows that the noise inherent in subsampling is sufficient to guarantee that query responses generalize. The simplicity of this subsampling-based framework allows it to model a variety of real-world scenarios not covered by prior work. In addition to its simplicity, we demonstrate the utility of this framework by designing mechanisms for two foundational tasks, statistical queries and median finding. In particular, our mechanism for answering the broadly applicable class of statistical queries is both extremely simple and state of the art in many parameter regimes.
    GLM Regression with Oblivious Corruptions. (arXiv:2309.11657v1 [cs.DS])
    We demonstrate the first algorithms for the problem of regression for generalized linear models (GLMs) in the presence of additive oblivious noise. We assume we have sample access to examples $(x, y)$ where $y$ is a noisy measurement of $g(w^* \cdot x)$. In particular, \new{the noisy labels are of the form} $y = g(w^* \cdot x) + \xi + \epsilon$, where $\xi$ is the oblivious noise drawn independently of $x$ \new{and satisfies} $\Pr[\xi = 0] \geq o(1)$, and $\epsilon \sim \mathcal N(0, \sigma^2)$. Our goal is to accurately recover a \new{parameter vector $w$ such that the} function $g(w \cdot x)$ \new{has} arbitrarily small error when compared to the true values $g(w^* \cdot x)$, rather than the noisy measurements $y$. We present an algorithm that tackles \new{this} problem in its most general distribution-independent setting, where the solution may not \new{even} be identifiable. \new{Our} algorithm returns \new{an accurate estimate of} the solution if it is identifiable, and otherwise returns a small list of candidates, one of which is close to the true solution. Furthermore, we \new{provide} a necessary and sufficient condition for identifiability, which holds in broad settings. \new{Specifically,} the problem is identifiable when the quantile at which $\xi + \epsilon = 0$ is known, or when the family of hypotheses does not contain candidates that are nearly equal to a translated $g(w^* \cdot x) + A$ for some real number $A$, while also having large error when compared to $g(w^* \cdot x)$. This is the first \new{algorithmic} result for GLM regression \new{with oblivious noise} which can handle more than half the samples being arbitrarily corrupted. Prior work focused largely on the setting of linear regression, and gave algorithms under restrictive assumptions.
    S-GBDT: Frugal Differentially Private Gradient Boosting Decision Trees. (arXiv:2309.12041v1 [cs.CR])
    Privacy-preserving learning of gradient boosting decision trees (GBDT) has the potential for strong utility-privacy tradeoffs for tabular data, such as census data or medical meta data: classical GBDT learners can extract non-linear patterns from small sized datasets. The state-of-the-art notion for provable privacy-properties is differential privacy, which requires that the impact of single data points is limited and deniable. We introduce a novel differentially private GBDT learner and utilize four main techniques to improve the utility-privacy tradeoff. (1) We use an improved noise scaling approach with tighter accounting of privacy leakage of a decision tree leaf compared to prior work, resulting in noise that in expectation scales with $O(1/n)$, for $n$ data points. (2) We integrate individual R\'enyi filters to our method to learn from data points that have been underutilized during an iterative training process, which -- potentially of independent interest -- results in a natural yet effective insight to learning streams of non-i.i.d. data. (3) We incorporate the concept of random decision tree splits to concentrate privacy budget on learning leaves. (4) We deploy subsampling for privacy amplification. Our evaluation shows for the Abalone dataset ($<4k$ training data points) a $R^2$-score of $0.39$ for $\varepsilon=0.15$, which the closest prior work only achieved for $\varepsilon=10.0$. On the Adult dataset ($50k$ training data points) we achieve test error of $18.7\,\%$ for $\varepsilon=0.07$ which the closest prior work only achieved for $\varepsilon=1.0$. For the Abalone dataset for $\varepsilon=0.54$ we achieve $R^2$-score of $0.47$ which is very close to the $R^2$-score of $0.54$ for the nonprivate version of GBDT. For the Adult dataset for $\varepsilon=0.54$ we achieve test error $17.1\,\%$ which is very close to the test error $13.7\,\%$ of the nonprivate version of GBDT.
    Large-scale Pretraining Improves Sample Efficiency of Active Learning based Molecule Virtual Screening. (arXiv:2309.11687v1 [cs.LG])
    Virtual screening of large compound libraries to identify potential hit candidates is one of the earliest steps in drug discovery. As the size of commercially available compound collections grows exponentially to the scale of billions, brute-force virtual screening using traditional tools such as docking becomes infeasible in terms of time and computational resources. Active learning and Bayesian optimization has recently been proven as effective methods of narrowing down the search space. An essential component in those methods is a surrogate machine learning model that is trained with a small subset of the library to predict the desired properties of compounds. Accurate model can achieve high sample efficiency by finding the most promising compounds with only a fraction of the whole library being virtually screened. In this study, we examined the performance of pretrained transformer-based language model and graph neural network in Bayesian optimization active learning framework. The best pretrained models identifies 58.97% of the top-50000 by docking score after screening only 0.6% of an ultra-large library containing 99.5 million compounds, improving 8% over previous state-of-the-art baseline. Through extensive benchmarks, we show that the superior performance of pretrained models persists in both structure-based and ligand-based drug discovery. Such model can serve as a boost to the accuracy and sample efficiency of active learning based molecule virtual screening.
    Dictionary Attack on IMU-based Gait Authentication. (arXiv:2309.11766v1 [cs.CR])
    We present a novel adversarial model for authentication systems that use gait patterns recorded by the inertial measurement unit (IMU) built into smartphones. The attack idea is inspired by and named after the concept of a dictionary attack on knowledge (PIN or password) based authentication systems. In particular, this work investigates whether it is possible to build a dictionary of IMUGait patterns and use it to launch an attack or find an imitator who can actively reproduce IMUGait patterns that match the target's IMUGait pattern. Nine physically and demographically diverse individuals walked at various levels of four predefined controllable and adaptable gait factors (speed, step length, step width, and thigh-lift), producing 178 unique IMUGait patterns. Each pattern attacked a wide variety of user authentication models. The deeper analysis of error rates (before and after the attack) challenges the belief that authentication systems based on IMUGait patterns are the most difficult to spoof; further research is needed on adversarial models and associated countermeasures.
    Distilling Adversarial Prompts from Safety Benchmarks: Report for the Adversarial Nibbler Challenge. (arXiv:2309.11575v1 [cs.CV])
    Text-conditioned image generation models have recently achieved astonishing image quality and alignment results. Consequently, they are employed in a fast-growing number of applications. Since they are highly data-driven, relying on billion-sized datasets randomly scraped from the web, they also produce unsafe content. As a contribution to the Adversarial Nibbler challenge, we distill a large set of over 1,000 potential adversarial inputs from existing safety benchmarks. Our analysis of the gathered prompts and corresponding images demonstrates the fragility of input filters and provides further insights into systematic safety issues in current generative image models.
    Differential Evolution Algorithm based Hyper-Parameters Selection of Transformer Neural Network Model for Load Forecasting. (arXiv:2307.15299v3 [cs.NE] UPDATED)
    Accurate load forecasting plays a vital role in numerous sectors, but accurately capturing the complex dynamics of dynamic power systems remains a challenge for traditional statistical models. For these reasons, time-series models (ARIMA) and deep-learning models (ANN, LSTM, GRU, etc.) are commonly deployed and often experience higher success. In this paper, we analyze the efficacy of the recently developed Transformer-based Neural Network model in Load forecasting. Transformer models have the potential to improve Load forecasting because of their ability to learn long-range dependencies derived from their Attention Mechanism. We apply several metaheuristics namely Differential Evolution to find the optimal hyperparameters of the Transformer-based Neural Network to produce accurate forecasts. Differential Evolution provides scalable, robust, global solutions to non-differentiable, multi-objective, or constrained optimization problems. Our work compares the proposed Transformer based Neural Network model integrated with different metaheuristic algorithms by their performance in Load forecasting based on numerical metrics such as Mean Squared Error (MSE) and Mean Absolute Percentage Error (MAPE). Our findings demonstrate the potential of metaheuristic-enhanced Transformer-based Neural Network models in Load forecasting accuracy and provide optimal hyperparameters for each model.
    Meta OOD Learning for Continuously Adaptive OOD Detection. (arXiv:2309.11705v1 [cs.LG])
    Out-of-distribution (OOD) detection is crucial to modern deep learning applications by identifying and alerting about the OOD samples that should not be tested or used for making predictions. Current OOD detection methods have made significant progress when in-distribution (ID) and OOD samples are drawn from static distributions. However, this can be unrealistic when applied to real-world systems which often undergo continuous variations and shifts in ID and OOD distributions over time. Therefore, for an effective application in real-world systems, the development of OOD detection methods that can adapt to these dynamic and evolving distributions is essential. In this paper, we propose a novel and more realistic setting called continuously adaptive out-of-distribution (CAOOD) detection which targets on developing an OOD detection model that enables dynamic and quick adaptation to a new arriving distribution, with insufficient ID samples during deployment time. To address CAOOD, we develop meta OOD learning (MOL) by designing a learning-to-adapt diagram such that a good initialized OOD detection model is learned during the training process. In the testing process, MOL ensures OOD detection performance over shifting distributions by quickly adapting to new distributions with a few adaptations. Extensive experiments on several OOD benchmarks endorse the effectiveness of our method in preserving both ID classification accuracy and OOD detection performance on continuously shifting distributions.
    The Reversal Curse: LLMs trained on "A is B" fail to learn "B is A". (arXiv:2309.12288v1 [cs.CL])
    We expose a surprising failure of generalization in auto-regressive large language models (LLMs). If a model is trained on a sentence of the form "A is B", it will not automatically generalize to the reverse direction "B is A". This is the Reversal Curse. For instance, if a model is trained on "Olaf Scholz was the ninth Chancellor of Germany", it will not automatically be able to answer the question, "Who was the ninth Chancellor of Germany?". Moreover, the likelihood of the correct answer ("Olaf Scholz") will not be higher than for a random name. Thus, models exhibit a basic failure of logical deduction and do not generalize a prevalent pattern in their training set (i.e. if "A is B'' occurs, "B is A" is more likely to occur). We provide evidence for the Reversal Curse by finetuning GPT-3 and Llama-1 on fictitious statements such as "Uriah Hawthorne is the composer of 'Abyssal Melodies'" and showing that they fail to correctly answer "Who composed 'Abyssal Melodies?'". The Reversal Curse is robust across model sizes and model families and is not alleviated by data augmentation. We also evaluate ChatGPT (GPT-3.5 and GPT-4) on questions about real-world celebrities, such as "Who is Tom Cruise's mother? [A: Mary Lee Pfeiffer]" and the reverse "Who is Mary Lee Pfeiffer's son?". GPT-4 correctly answers questions like the former 79% of the time, compared to 33% for the latter. This shows a failure of logical deduction that we hypothesize is caused by the Reversal Curse. Code is available at https://github.com/lukasberglund/reversal_curse.
    Optimal Propagation for Graph Neural Networks. (arXiv:2205.02998v2 [cs.LG] UPDATED)
    Graph Neural Networks (GNNs) have achieved tremendous success in a variety of real-world applications by relying on the fixed graph data as input. However, the initial input graph might not be optimal in terms of specific downstream tasks, because of information scarcity, noise, adversarial attacks, or discrepancies between the distribution in graph topology, features, and groundtruth labels. In this paper, we propose a bi-level optimization approach for learning the optimal graph structure via directly learning the Personalized PageRank propagation matrix as well as the downstream semi-supervised node classification simultaneously. We also explore a low-rank approximation model for further reducing the time complexity. Empirical evaluations show the superior efficacy and robustness of the proposed model over all baseline methods.
    Weakly-supervised Automated Audio Captioning via text only training. (arXiv:2309.12242v1 [cs.SD])
    In recent years, datasets of paired audio and captions have enabled remarkable success in automatically generating descriptions for audio clips, namely Automated Audio Captioning (AAC). However, it is labor-intensive and time-consuming to collect a sufficient number of paired audio and captions. Motivated by the recent advances in Contrastive Language-Audio Pretraining (CLAP), we propose a weakly-supervised approach to train an AAC model assuming only text data and a pre-trained CLAP model, alleviating the need for paired target data. Our approach leverages the similarity between audio and text embeddings in CLAP. During training, we learn to reconstruct the text from the CLAP text embedding, and during inference, we decode using the audio embeddings. To mitigate the modality gap between the audio and text embeddings we employ strategies to bridge the gap during training and inference stages. We evaluate our proposed method on Clotho and AudioCaps datasets demonstrating its ability to achieve a relative performance of up to ~$83\%$ compared to fully supervised approaches trained with paired target data.
    Human-in-the-Loop Causal Discovery under Latent Confounding using Ancestral GFlowNets. (arXiv:2309.12032v1 [cs.LG])
    Structure learning is the crux of causal inference. Notably, causal discovery (CD) algorithms are brittle when data is scarce, possibly inferring imprecise causal relations that contradict expert knowledge -- especially when considering latent confounders. To aggravate the issue, most CD methods do not provide uncertainty estimates, making it hard for users to interpret results and improve the inference process. Surprisingly, while CD is a human-centered affair, no works have focused on building methods that both 1) output uncertainty estimates that can be verified by experts and 2) interact with those experts to iteratively refine CD. To solve these issues, we start by proposing to sample (causal) ancestral graphs proportionally to a belief distribution based on a score function, such as the Bayesian information criterion (BIC), using generative flow networks. Then, we leverage the diversity in candidate graphs and introduce an optimal experimental design to iteratively probe the expert about the relations among variables, effectively reducing the uncertainty of our belief over ancestral graphs. Finally, we update our samples to incorporate human feedback via importance sampling. Importantly, our method does not require causal sufficiency (i.e., unobserved confounders may exist). Experiments with synthetic observational data show that our method can accurately sample from distributions over ancestral graphs and that we can greatly improve inference quality with human aid.
    Clustering-based Domain-Incremental Learning. (arXiv:2309.12078v1 [cs.LG])
    We consider the problem of learning multiple tasks in a continual learning setting in which data from different tasks is presented to the learner in a streaming fashion. A key challenge in this setting is the so-called "catastrophic forgetting problem", in which the performance of the learner in an "old task" decreases when subsequently trained on a "new task". Existing continual learning methods, such as Averaged Gradient Episodic Memory (A-GEM) and Orthogonal Gradient Descent (OGD), address catastrophic forgetting by minimizing the loss for the current task without increasing the loss for previous tasks. However, these methods assume the learner knows when the task changes, which is unrealistic in practice. In this paper, we alleviate the need to provide the algorithm with information about task changes by using an online clustering-based approach on a dynamically updated finite pool of samples or gradients. We thereby successfully counteract catastrophic forgetting in one of the hardest settings, namely: domain-incremental learning, a setting for which the problem was previously unsolved. We showcase the benefits of our approach by applying these ideas to projection-based methods, such as A-GEM and OGD, which lead to task-agnostic versions of them. Experiments on real datasets demonstrate the effectiveness of the proposed strategy and its promising performance compared to state-of-the-art methods.
    Traffic Forecasting on New Roads Using Spatial Contrastive Pre-Training (SCPT). (arXiv:2305.05237v4 [cs.LG] UPDATED)
    New roads are being constructed all the time. However, the capabilities of previous deep forecasting models to generalize to new roads not seen in the training data (unseen roads) are rarely explored. In this paper, we introduce a novel setup called a spatio-temporal (ST) split to evaluate the models' capabilities to generalize to unseen roads. In this setup, the models are trained on data from a sample of roads, but tested on roads not seen in the training data. Moreover, we also present a novel framework called Spatial Contrastive Pre-Training (SCPT) where we introduce a spatial encoder module to extract latent features from unseen roads during inference time. This spatial encoder is pre-trained using contrastive learning. During inference, the spatial encoder only requires two days of traffic data on the new roads and does not require any re-training. We also show that the output from the spatial encoder can be used effectively to infer latent node embeddings on unseen roads during inference time. The SCPT framework also incorporates a new layer, named the spatially gated addition (SGA) layer, to effectively combine the latent features from the output of the spatial encoder to existing backbones. Additionally, since there is limited data on the unseen roads, we argue that it is better to decouple traffic signals to trivial-to-capture periodic signals and difficult-to-capture Markovian signals, and for the spatial encoder to only learn the Markovian signals. Finally, we empirically evaluated SCPT using the ST split setup on four real-world datasets. The results showed that adding SCPT to a backbone consistently improves forecasting performance on unseen roads. More importantly, the improvements are greater when forecasting further into the future. The codes are available on GitHub: https://github.com/cruiseresearchgroup/forecasting-on-new-roads .
    A Dynamic Domain Adaptation Deep Learning Network for EEG-based Motor Imagery Classification. (arXiv:2309.11714v1 [eess.SP])
    There is a correlation between adjacent channels of electroencephalogram (EEG), and how to represent this correlation is an issue that is currently being explored. In addition, due to inter-individual differences in EEG signals, this discrepancy results in new subjects need spend a amount of calibration time for EEG-based motor imagery brain-computer interface. In order to solve the above problems, we propose a Dynamic Domain Adaptation Based Deep Learning Network (DADL-Net). First, the EEG data is mapped to the three-dimensional geometric space and its temporal-spatial features are learned through the 3D convolution module, and then the spatial-channel attention mechanism is used to strengthen the features, and the final convolution module can further learn the spatial-temporal information of the features. Finally, to account for inter-subject and cross-sessions differences, we employ a dynamic domain-adaptive strategy, the distance between features is reduced by introducing a Maximum Mean Discrepancy loss function, and the classification layer is fine-tuned by using part of the target domain data. We verify the performance of the proposed method on BCI competition IV 2a and OpenBMI datasets. Under the intra-subject experiment, the accuracy rates of 70.42% and 73.91% were achieved on the OpenBMI and BCIC IV 2a datasets.
    Towards Robust and Truly Large-Scale Audio-Sheet Music Retrieval. (arXiv:2309.12158v1 [cs.SD])
    A range of applications of multi-modal music information retrieval is centred around the problem of connecting large collections of sheet music (images) to corresponding audio recordings, that is, identifying pairs of audio and score excerpts that refer to the same musical content. One of the typical and most recent approaches to this task employs cross-modal deep learning architectures to learn joint embedding spaces that link the two distinct modalities - audio and sheet music images. While there has been steady improvement on this front over the past years, a number of open problems still prevent large-scale employment of this methodology. In this article we attempt to provide an insightful examination of the current developments on audio-sheet music retrieval via deep learning methods. We first identify a set of main challenges on the road towards robust and large-scale cross-modal music retrieval in real scenarios. We then highlight the steps we have taken so far to address some of these challenges, documenting step-by-step improvement along several dimensions. We conclude by analysing the remaining challenges and present ideas for solving these, in order to pave the way to a unified and robust methodology for cross-modal music retrieval.
    Electroencephalogram Sensor Data Compression Using An Asymmetrical Sparse Autoencoder With A Discrete Cosine Transform Layer. (arXiv:2309.12201v1 [eess.SP])
    Electroencephalogram (EEG) data compression is necessary for wireless recording applications to reduce the amount of data that needs to be transmitted. In this paper, an asymmetrical sparse autoencoder with a discrete cosine transform (DCT) layer is proposed to compress EEG signals. The encoder module of the autoencoder has a combination of a fully connected linear layer and the DCT layer to reduce redundant data using hard-thresholding nonlinearity. Furthermore, the DCT layer includes trainable hard-thresholding parameters and scaling layers to give emphasis or de-emphasis on individual DCT coefficients. Finally, the one-by-one convolutional layer generates the latent space. The sparsity penalty-based cost function is employed to keep the feature map as sparse as possible in the latent space. The latent space data is transmitted to the receiver. The decoder module of the autoencoder is designed using the inverse DCT and two fully connected linear layers to improve the accuracy of data reconstruction. In comparison to other state-of-the-art methods, the proposed method significantly improves the average quality score in various data compression experiments.
    Fairness Vs. Personalization: Towards Equity in Epistemic Utility. (arXiv:2309.11503v1 [cs.IR])
    The applications of personalized recommender systems are rapidly expanding: encompassing social media, online shopping, search engine results, and more. These systems offer a more efficient way to navigate the vast array of items available. However, alongside this growth, there has been increased recognition of the potential for algorithmic systems to exhibit and perpetuate biases, risking unfairness in personalized domains. In this work, we explicate the inherent tension between personalization and conventional implementations of fairness. As an alternative, we propose equity to achieve fairness in the context of epistemic utility. We provide a mapping between goals and practical implementations and detail policy recommendations across key stakeholders to forge a path towards achieving fairness in personalized systems.
    Global universal approximation of functional input maps on weighted spaces. (arXiv:2306.03303v2 [stat.ML] UPDATED)
    We introduce so-called functional input neural networks defined on a possibly infinite dimensional weighted space with values also in a possibly infinite dimensional output space. To this end, we use an additive family as hidden layer maps and a non-linear activation function applied to each hidden layer. Relying on Stone-Weierstrass theorems on weighted spaces, we can prove a global universal approximation result for generalizations of continuous functions going beyond the usual approximation on compact sets. This then applies in particular to approximation of (non-anticipative) path space functionals via functional input neural networks. As a further application of the weighted Stone-Weierstrass theorem we prove a global universal approximation result for linear functions of the signature. We also introduce the viewpoint of Gaussian process regression in this setting and show that the reproducing kernel Hilbert space of the signature kernels are Cameron-Martin spaces of certain Gaussian processes. This paves the way towards uncertainty quantification for signature kernel regression.
    Incentivized Communication for Federated Bandits. (arXiv:2309.11702v1 [cs.LG])
    Most existing works on federated bandits take it for granted that all clients are altruistic about sharing their data with the server for the collective good whenever needed. Despite their compelling theoretical guarantee on performance and communication efficiency, this assumption is overly idealistic and oftentimes violated in practice, especially when the algorithm is operated over self-interested clients, who are reluctant to share data without explicit benefits. Negligence of such self-interested behaviors can significantly affect the learning efficiency and even the practical operability of federated bandit learning. In light of this, we aim to spark new insights into this under-explored research area by formally introducing an incentivized communication problem for federated bandits, where the server shall motivate clients to share data by providing incentives. Without loss of generality, we instantiate this bandit problem with the contextual linear setting and propose the first incentivized communication protocol, namely, Inc-FedUCB, that achieves near-optimal regret with provable communication and incentive cost guarantees. Extensive empirical experiments on both synthetic and real-world datasets further validate the effectiveness of the proposed method across various environments.
    Efficient Core-selecting Incentive Mechanism for Data Sharing in Federated Learning. (arXiv:2309.11722v1 [cs.GT])
    Federated learning is a distributed machine learning system that uses participants' data to train an improved global model. In federated learning, participants cooperatively train a global model, and they will receive the global model and payments. Rational participants try to maximize their individual utility, and they will not input their high-quality data truthfully unless they are provided with satisfactory payments based on their data quality. Furthermore, federated learning benefits from the cooperative contributions of participants. Accordingly, how to establish an incentive mechanism that both incentivizes inputting data truthfully and promotes stable cooperation has become an important issue to consider. In this paper, we introduce a data sharing game model for federated learning and employ game-theoretic approaches to design a core-selecting incentive mechanism by utilizing a popular concept in cooperative games, the core. In federated learning, the core can be empty, resulting in the core-selecting mechanism becoming infeasible. To address this, our core-selecting mechanism employs a relaxation method and simultaneously minimizes the benefits of inputting false data for all participants. However, this mechanism is computationally expensive because it requires aggregating exponential models for all possible coalitions, which is infeasible in federated learning. To address this, we propose an efficient core-selecting mechanism based on sampling approximation that only aggregates models on sampled coalitions to approximate the exact result. Extensive experiments verify that the efficient core-selecting mechanism can incentivize inputting high-quality data and stable cooperation, while it reduces computational overhead compared to the core-selecting mechanism.
    A Comprehensive Review of Community Detection in Graphs. (arXiv:2309.11798v1 [cs.SI])
    The study of complex networks has significantly advanced our understanding of community structures which serves as a crucial feature of real-world graphs. Detecting communities in graphs is a challenging problem with applications in sociology, biology, and computer science. Despite the efforts of an interdisciplinary community of scientists, a satisfactory solution to this problem has not yet been achieved. This review article delves into the topic of community detection in graphs, which serves as a crucial role in understanding the organization and functioning of complex systems. We begin by introducing the concept of community structure, which refers to the arrangement of vertices into clusters, with strong internal connections and weaker connections between clusters. Then, we provide a thorough exposition of various community detection methods, including a new method designed by us. Additionally, we explore real-world applications of community detection in diverse networks. In conclusion, this comprehensive review provides a deep understanding of community detection in graphs. It serves as a valuable resource for researchers and practitioners in multiple disciplines, offering insights into the challenges, methodologies, and applications of community detection in complex networks.
    ConDA: Contrastive Domain Adaptation for AI-generated Text Detection. (arXiv:2309.03992v2 [cs.CL] UPDATED)
    Large language models (LLMs) are increasingly being used for generating text in a variety of use cases, including journalistic news articles. Given the potential malicious nature in which these LLMs can be used to generate disinformation at scale, it is important to build effective detectors for such AI-generated text. Given the surge in development of new LLMs, acquiring labeled training data for supervised detectors is a bottleneck. However, there might be plenty of unlabeled text data available, without information on which generator it came from. In this work we tackle this data problem, in detecting AI-generated news text, and frame the problem as an unsupervised domain adaptation task. Here the domains are the different text generators, i.e. LLMs, and we assume we have access to only the labeled source data and unlabeled target data. We develop a Contrastive Domain Adaptation framework, called ConDA, that blends standard domain adaptation techniques with the representation power of contrastive learning to learn domain invariant representations that are effective for the final unsupervised detection task. Our experiments demonstrate the effectiveness of our framework, resulting in average performance gains of 31.7% from the best performing baselines, and within 0.8% margin of a fully supervised detector. All our code and data is available at https://github.com/AmritaBh/ConDA-gen-text-detection.
    DimCL: Dimensional Contrastive Learning For Improving Self-Supervised Learning. (arXiv:2309.11782v1 [cs.CV])
    Self-supervised learning (SSL) has gained remarkable success, for which contrastive learning (CL) plays a key role. However, the recent development of new non-CL frameworks has achieved comparable or better performance with high improvement potential, prompting researchers to enhance these frameworks further. Assimilating CL into non-CL frameworks has been thought to be beneficial, but empirical evidence indicates no visible improvements. In view of that, this paper proposes a strategy of performing CL along the dimensional direction instead of along the batch direction as done in conventional contrastive learning, named Dimensional Contrastive Learning (DimCL). DimCL aims to enhance the feature diversity, and it can serve as a regularizer to prior SSL frameworks. DimCL has been found to be effective, and the hardness-aware property is identified as a critical reason for its success. Extensive experimental results reveal that assimilating DimCL into SSL frameworks leads to performance improvement by a non-trivial margin on various datasets and backbone architectures.
    Privacy-Preserving In-Context Learning with Differentially Private Few-Shot Generation. (arXiv:2309.11765v1 [cs.LG])
    We study the problem of in-context learning (ICL) with large language models (LLMs) on private datasets. This scenario poses privacy risks, as LLMs may leak or regurgitate the private examples demonstrated in the prompt. We propose a novel algorithm that generates synthetic few-shot demonstrations from the private dataset with formal differential privacy (DP) guarantees, and show empirically that it can achieve effective ICL. We conduct extensive experiments on standard benchmarks and compare our algorithm with non-private ICL and zero-shot solutions. Our results demonstrate that our algorithm can achieve competitive performance with strong privacy levels. These results open up new possibilities for ICL with privacy protection for a broad range of applications.  ( 2 min )
    Quantum Conformal Prediction for Reliable Uncertainty Quantification in Quantum Machine Learning. (arXiv:2304.03398v2 [quant-ph] UPDATED)
    Quantum machine learning is a promising programming paradigm for the optimization of quantum algorithms in the current era of noisy intermediate scale quantum (NISQ) computers. A fundamental challenge in quantum machine learning is generalization, as the designer targets performance under testing conditions, while having access only to limited training data. Existing generalization analyses, while identifying important general trends and scaling laws, cannot be used to assign reliable and informative "error bars" to the decisions made by quantum models. In this article, we propose a general methodology that can reliably quantify the uncertainty of quantum models, irrespective of the amount of training data, of the number of shots, of the ansatz, of the training algorithm, and of the presence of quantum hardware noise. The approach, which builds on probabilistic conformal prediction, turns an arbitrary, possibly small, number of shots from a pre-trained quantum model into a set prediction, e.g., an interval, that provably contains the true target with any desired coverage level. Experimental results confirm the theoretical calibration guarantees of the proposed framework, referred to as quantum conformal prediction.
    Hierarchical reinforcement learning with natural language subgoals. (arXiv:2309.11564v1 [cs.LG])
    Hierarchical reinforcement learning has been a compelling approach for achieving goal directed behavior over long sequences of actions. However, it has been challenging to implement in realistic or open-ended environments. A main challenge has been to find the right space of sub-goals over which to instantiate a hierarchy. We present a novel approach where we use data from humans solving these tasks to softly supervise the goal space for a set of long range tasks in a 3D embodied environment. In particular, we use unconstrained natural language to parameterize this space. This has two advantages: first, it is easy to generate this data from naive human participants; second, it is flexible enough to represent a vast range of sub-goals in human-relevant tasks. Our approach outperforms agents that clone expert behavior on these tasks, as well as HRL from scratch without this supervised sub-goal space. Our work presents a novel approach to combining human expert supervision with the benefits and flexibility of reinforcement learning.
    Regionally Additive Models: Explainable-by-design models minimizing feature interactions. (arXiv:2309.12215v1 [cs.LG])
    Generalized Additive Models (GAMs) are widely used explainable-by-design models in various applications. GAMs assume that the output can be represented as a sum of univariate functions, referred to as components. However, this assumption fails in ML problems where the output depends on multiple features simultaneously. In these cases, GAMs fail to capture the interaction terms of the underlying function, leading to subpar accuracy. To (partially) address this issue, we propose Regionally Additive Models (RAMs), a novel class of explainable-by-design models. RAMs identify subregions within the feature space where interactions are minimized. Within these regions, it is more accurate to express the output as a sum of univariate functions (components). Consequently, RAMs fit one component per subregion of each feature instead of one component per feature. This approach yields a more expressive model compared to GAMs while retaining interpretability. The RAM framework consists of three steps. Firstly, we train a black-box model. Secondly, using Regional Effect Plots, we identify subregions where the black-box model exhibits near-local additivity. Lastly, we fit a GAM component for each identified subregion. We validate the effectiveness of RAMs through experiments on both synthetic and real-world datasets. The results confirm that RAMs offer improved expressiveness compared to GAMs while maintaining interpretability.
    Prodigy: An Expeditiously Adaptive Parameter-Free Learner. (arXiv:2306.06101v2 [cs.LG] UPDATED)
    We consider the problem of estimating the learning rate in adaptive methods, such as Adagrad and Adam. We describe two techniques, Prodigy and Resetting, to provably estimate the distance to the solution $D$, which is needed to set the learning rate optimally. Our techniques are modifications of the D-Adaptation method for learning-rate-free learning. Our methods improve upon the convergence rate of D-Adaptation by a factor of $O(\sqrt{\log(D/d_0)})$, where $d_0$ is the initial estimate of $D$. We test our methods on 12 common logistic-regression benchmark datasets, VGG11 and ResNet-50 training on CIFAR10, ViT training on Imagenet, LSTM training on IWSLT14, DLRM training on Criteo dataset, VarNet on Knee MRI dataset, as well as RoBERTa and GPT transformer training on BookWiki. Our experimental results show that our approaches consistently outperform D-Adaptation and reach test accuracy values close to that of hand-tuned Adam.
    Multi-agent Deep Covering Skill Discovery. (arXiv:2210.03269v3 [cs.LG] UPDATED)
    The use of skills (a.k.a., options) can greatly accelerate exploration in reinforcement learning, especially when only sparse reward signals are available. While option discovery methods have been proposed for individual agents, in multi-agent reinforcement learning settings, discovering collaborative options that can coordinate the behavior of multiple agents and encourage them to visit the under-explored regions of their joint state space has not been considered. In this case, we propose Multi-agent Deep Covering Option Discovery, which constructs the multi-agent options through minimizing the expected cover time of the multiple agents' joint state space. Also, we propose a novel framework to adopt the multi-agent options in the MARL process. In practice, a multi-agent task can usually be divided into some sub-tasks, each of which can be completed by a sub-group of the agents. Therefore, our algorithm framework first leverages an attention mechanism to find collaborative agent sub-groups that would benefit most from coordinated actions. Then, a hierarchical algorithm, namely HA-MSAC, is developed to learn the multi-agent options for each sub-group to complete their sub-tasks first, and then to integrate them through a high-level policy as the solution of the whole task. This hierarchical option construction allows our framework to strike a balance between scalability and effective collaboration among the agents. The evaluation based on multi-agent collaborative tasks shows that the proposed algorithm can effectively capture the agent interactions with the attention mechanism, successfully identify multi-agent options, and significantly outperforms prior works using single-agent options or no options, in terms of both faster exploration and higher task rewards.
    Multiclass Learnability Does Not Imply Sample Compression. (arXiv:2308.06424v2 [cs.LG] UPDATED)
    A hypothesis class admits a sample compression scheme, if for every sample labeled by a hypothesis from the class, it is possible to retain only a small subsample, using which the labels on the entire sample can be inferred. The size of the compression scheme is an upper bound on the size of the subsample produced. Every learnable binary hypothesis class (which must necessarily have finite VC dimension) admits a sample compression scheme of size only a finite function of its VC dimension, independent of the sample size. For multiclass hypothesis classes, the analog of VC dimension is the DS dimension. We show that the analogous statement pertaining to sample compression is not true for multiclass hypothesis classes: every learnable multiclass hypothesis class, which must necessarily have finite DS dimension, does not admit a sample compression scheme of size only a finite function of its DS dimension.
    Text2Reward: Automated Dense Reward Function Generation for Reinforcement Learning. (arXiv:2309.11489v2 [cs.LG] UPDATED)
    Designing reward functions is a longstanding challenge in reinforcement learning (RL); it requires specialized knowledge or domain data, leading to high costs for development. To address this, we introduce Text2Reward, a data-free framework that automates the generation of dense reward functions based on large language models (LLMs). Given a goal described in natural language, Text2Reward generates dense reward functions as an executable program grounded in a compact representation of the environment. Unlike inverse RL and recent work that uses LLMs to write sparse reward codes, Text2Reward produces interpretable, free-form dense reward codes that cover a wide range of tasks, utilize existing packages, and allow iterative refinement with human feedback. We evaluate Text2Reward on two robotic manipulation benchmarks (ManiSkill2, MetaWorld) and two locomotion environments of MuJoCo. On 13 of the 17 manipulation tasks, policies trained with generated reward codes achieve similar or better task success rates and convergence speed than expert-written reward codes. For locomotion tasks, our method learns six novel locomotion behaviors with a success rate exceeding 94%. Furthermore, we show that the policies trained in the simulator with our method can be deployed in the real world. Finally, Text2Reward further improves the policies by refining their reward functions with human feedback. Video results are available at https://text-to-reward.github.io
    A Constructive Approach to Function Realization by Neural Stochastic Differential Equations. (arXiv:2307.00215v2 [math.OC] UPDATED)
    The problem of function approximation by neural dynamical systems has typically been approached in a top-down manner: Any continuous function can be approximated to an arbitrary accuracy by a sufficiently complex model with a given architecture. This can lead to high-complexity controls which are impractical in applications. In this paper, we take the opposite, constructive approach: We impose various structural restrictions on system dynamics and consequently characterize the class of functions that can be realized by such a system. The systems are implemented as a cascade interconnection of a neural stochastic differential equation (Neural SDE), a deterministic dynamical system, and a readout map. Both probabilistic and geometric (Lie-theoretic) methods are used to characterize the classes of functions realized by such systems.
    A Machine Learning-oriented Survey on Tiny Machine Learning. (arXiv:2309.11932v1 [cs.LG])
    The emergence of Tiny Machine Learning (TinyML) has positively revolutionized the field of Artificial Intelligence by promoting the joint design of resource-constrained IoT hardware devices and their learning-based software architectures. TinyML carries an essential role within the fourth and fifth industrial revolutions in helping societies, economies, and individuals employ effective AI-infused computing technologies (e.g., smart cities, automotive, and medical robotics). Given its multidisciplinary nature, the field of TinyML has been approached from many different angles: this comprehensive survey wishes to provide an up-to-date overview focused on all the learning algorithms within TinyML-based solutions. The survey is based on the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) methodological flow, allowing for a systematic and complete literature survey. In particular, firstly we will examine the three different workflows for implementing a TinyML-based system, i.e., ML-oriented, HW-oriented, and co-design. Secondly, we propose a taxonomy that covers the learning panorama under the TinyML lens, examining in detail the different families of model optimization and design, as well as the state-of-the-art learning techniques. Thirdly, this survey will present the distinct features of hardware devices and software tools that represent the current state-of-the-art for TinyML intelligent edge applications. Finally, we discuss the challenges and future directions.
    On the Probability of Immunity. (arXiv:2309.11942v1 [stat.ME])
    This work is devoted to the study of the probability of immunity, i.e. the effect occurs whether exposed or not. We derive necessary and sufficient conditions for non-immunity and $\epsilon$-bounded immunity, i.e. the probability of immunity is zero and $\epsilon$-bounded, respectively. The former allows us to estimate the probability of benefit (i.e., the effect occurs if and only if exposed) from a randomized controlled trial, and the latter allows us to produce bounds of the probability of benefit that are tighter than the existing ones. We also introduce the concept of indirect immunity (i.e., through a mediator) and repeat our previous analysis for it. Finally, we propose a method for sensitivity analysis of the probability of immunity under unmeasured confounding.
    Quasi-Monte Carlo for 3D Sliced Wasserstein. (arXiv:2309.11713v1 [stat.ML])
    Monte Carlo (MC) approximation has been used as the standard computation approach for the Sliced Wasserstein (SW) distance, which has an intractable expectation in its analytical form. However, the MC method is not optimal in terms of minimizing the absolute approximation error. To provide a better class of empirical SW, we propose quasi-sliced Wasserstein (QSW) approximations that rely on Quasi-Monte Carlo (QMC) methods. For a comprehensive investigation of QMC for SW, we focus on the 3D setting, specifically computing the SW between probability measures in three dimensions. In greater detail, we empirically verify various ways of constructing QMC points sets on the 3D unit-hypersphere, including Gaussian-based mapping, equal area mapping, generalized spiral points, and optimizing discrepancy energies. Furthermore, to obtain an unbiased estimation for stochastic optimization, we extend QSW into Randomized Quasi-Sliced Wasserstein (RQSW) by introducing randomness to the discussed low-discrepancy sequences. For theoretical properties, we prove the asymptotic convergence of QSW and the unbiasedness of RQSW. Finally, we conduct experiments on various 3D tasks, such as point-cloud comparison, point-cloud interpolation, image style transfer, and training deep point-cloud autoencoders, to demonstrate the favorable performance of the proposed QSW and RQSW variants.  ( 2 min )
    EPTQ: Enhanced Post-Training Quantization via Label-Free Hessian. (arXiv:2309.11531v1 [cs.CV])
    Quantization of deep neural networks (DNN) has become a key element in the efforts of embedding such networks on end-user devices. However, current quantization methods usually suffer from costly accuracy degradation. In this paper, we propose a new method for Enhanced Post Training Quantization named EPTQ. The method is based on knowledge distillation with an adaptive weighting of layers. In addition, we introduce a new label-free technique for approximating the Hessian trace of the task loss, named Label-Free Hessian. This technique removes the requirement of a labeled dataset for computing the Hessian. The adaptive knowledge distillation uses the Label-Free Hessian technique to give greater attention to the sensitive parts of the model while performing the optimization. Empirically, by employing EPTQ we achieve state-of-the-art results on a wide variety of models, tasks, and datasets, including ImageNet classification, COCO object detection, and Pascal-VOC for semantic segmentation. We demonstrate the performance and compatibility of EPTQ on an extended set of architectures, including CNNs, Transformers, hybrid, and MLP-only models.  ( 2 min )
    Bayesian Flow Networks. (arXiv:2308.07037v2 [cs.LG] UPDATED)
    This paper introduces Bayesian Flow Networks (BFNs), a new class of generative model in which the parameters of a set of independent distributions are modified with Bayesian inference in the light of noisy data samples, then passed as input to a neural network that outputs a second, interdependent distribution. Starting from a simple prior and iteratively updating the two distributions yields a generative procedure similar to the reverse process of diffusion models; however it is conceptually simpler in that no forward process is required. Discrete and continuous-time loss functions are derived for continuous, discretised and discrete data, along with sample generation procedures. Notably, the network inputs for discrete data lie on the probability simplex, and are therefore natively differentiable, paving the way for gradient-based sample guidance and few-step generation in discrete domains such as language modelling. The loss function directly optimises data compression and places no restrictions on the network architecture. In our experiments BFNs achieve competitive log-likelihoods for image modelling on dynamically binarized MNIST and CIFAR-10, and outperform all known discrete diffusion models on the text8 character-level language modelling task.
    Unveiling Optimal SDG Pathways: An Innovative Approach Leveraging Graph Pruning and Intent Graph for Effective Recommendations. (arXiv:2309.11741v1 [cs.IR])
    The recommendation of appropriate development pathways, also known as ecological civilization patterns for achieving Sustainable Development Goals (namely, sustainable development patterns), are of utmost importance for promoting ecological, economic, social, and resource sustainability in a specific region. To achieve this, the recommendation process must carefully consider the region's natural, environmental, resource, and economic characteristics. However, current recommendation algorithms in the field of computer science fall short in adequately addressing the spatial heterogeneity related to environment and sparsity of regional historical interaction data, which limits their effectiveness in recommending sustainable development patterns. To overcome these challenges, this paper proposes a method called User Graph after Pruning and Intent Graph (UGPIG). Firstly, we utilize the high-density linking capability of the pruned User Graph to address the issue of spatial heterogeneity neglect in recommendation algorithms. Secondly, we construct an Intent Graph by incorporating the intent network, which captures the preferences for attributes including environmental elements of target regions. This approach effectively alleviates the problem of sparse historical interaction data in the region. Through extensive experiments, we demonstrate that UGPIG outperforms state-of-the-art recommendation algorithms like KGCN, KGAT, and KGIN in sustainable development pattern recommendations, with a maximum improvement of 9.61% in Top-3 recommendation performance.  ( 2 min )
    CATS: Conditional Adversarial Trajectory Synthesis for Privacy-Preserving Trajectory Data Publication Using Deep Learning Approaches. (arXiv:2309.11587v1 [cs.LG])
    The prevalence of ubiquitous location-aware devices and mobile Internet enables us to collect massive individual-level trajectory dataset from users. Such trajectory big data bring new opportunities to human mobility research but also raise public concerns with regard to location privacy. In this work, we present the Conditional Adversarial Trajectory Synthesis (CATS), a deep-learning-based GeoAI methodological framework for privacy-preserving trajectory data generation and publication. CATS applies K-anonymity to the underlying spatiotemporal distributions of human movements, which provides a distributional-level strong privacy guarantee. By leveraging conditional adversarial training on K-anonymized human mobility matrices, trajectory global context learning using the attention-based mechanism, and recurrent bipartite graph matching of adjacent trajectory points, CATS is able to reconstruct trajectory topology from conditionally sampled locations and generate high-quality individual-level synthetic trajectory data, which can serve as supplements or alternatives to raw data for privacy-preserving trajectory data publication. The experiment results on over 90k GPS trajectories show that our method has a better performance in privacy preservation, spatiotemporal characteristic preservation, and downstream utility compared with baseline methods, which brings new insights into privacy-preserving human mobility research using generative AI techniques and explores data ethics issues in GIScience.
    Drift Control of High-Dimensional RBM: A Computational Method Based on Neural Networks. (arXiv:2309.11651v1 [eess.SY])
    Motivated by applications in queueing theory, we consider a stochastic control problem whose state space is the $d$-dimensional positive orthant. The controlled process $Z$ evolves as a reflected Brownian motion whose covariance matrix is exogenously specified, as are its directions of reflection from the orthant's boundary surfaces. A system manager chooses a drift vector $\theta(t)$ at each time $t$ based on the history of $Z$, and the cost rate at time $t$ depends on both $Z(t)$ and $\theta(t)$. In our initial problem formulation, the objective is to minimize expected discounted cost over an infinite planning horizon, after which we treat the corresponding ergodic control problem. Extending earlier work by Han et al. (Proceedings of the National Academy of Sciences, 2018, 8505-8510), we develop and illustrate a simulation-based computational method that relies heavily on deep neural network technology. For test problems studied thus far, our method is accurate to within a fraction of one percent, and is computationally feasible in dimensions up to at least $d=30$.
    Private Matrix Factorization with Public Item Features. (arXiv:2309.11516v1 [cs.IR])
    We consider the problem of training private recommendation models with access to public item features. Training with Differential Privacy (DP) offers strong privacy guarantees, at the expense of loss in recommendation quality. We show that incorporating public item features during training can help mitigate this loss in quality. We propose a general approach based on collective matrix factorization (CMF), that works by simultaneously factorizing two matrices: the user feedback matrix (representing sensitive data) and an item feature matrix that encodes publicly available (non-sensitive) item information. The method is conceptually simple, easy to tune, and highly scalable. It can be applied to different types of public item data, including: (1) categorical item features; (2) item-item similarities learned from public sources; and (3) publicly available user feedback. Furthermore, these data modalities can be collectively utilized to fully leverage public data. Evaluating our method on a standard DP recommendation benchmark, we find that using public item features significantly narrows the quality gap between private models and their non-private counterparts. As privacy constraints become more stringent, models rely more heavily on public side features for recommendation. This results in a smooth transition from collaborative filtering to item-based contextual recommendations.
    Instruction Tuning for Large Language Models: A Survey. (arXiv:2308.10792v2 [cs.CL] UPDATED)
    This paper surveys research works in the quickly advancing field of instruction tuning (IT), a crucial technique to enhance the capabilities and controllability of large language models (LLMs). Instruction tuning refers to the process of further training LLMs on a dataset consisting of \textsc{(instruction, output)} pairs in a supervised fashion, which bridges the gap between the next-word prediction objective of LLMs and the users' objective of having LLMs adhere to human instructions. In this work, we make a systematic review of the literature, including the general methodology of IT, the construction of IT datasets, the training of IT models, and applications to different modalities, domains and applications, along with an analysis on aspects that influence the outcome of IT (e.g., generation of instruction outputs, size of the instruction dataset, etc). We also review the potential pitfalls of IT along with criticism against it, along with efforts pointing out current deficiencies of existing strategies and suggest some avenues for fruitful research.
    Likelihood-based Sensor Calibration for Expert-Supported Distributed Learning Algorithms in IoT Systems. (arXiv:2309.11526v1 [cs.LG])
    An important task in the field of sensor technology is the efficient implementation of adaptation procedures of measurements from one sensor to another sensor of identical design. One idea is to use the estimation of an affine transformation between different systems, which can be improved by the knowledge of experts. This paper presents an improved solution from Glacier Research that was published back in 1973. It is shown that this solution can be adapted for software calibration of sensors, implementation of expert-based adaptation, and federated learning methods. We evaluate our research with simulations and also with real measured data of a multi-sensor board with 8 identical sensors. The results show an improvement for both the simulation and the experiments with real data.
    Leveraging Negative Signals with Self-Attention for Sequential Music Recommendation. (arXiv:2309.11623v1 [cs.IR])
    Music streaming services heavily rely on their recommendation engines to continuously provide content to their consumers. Sequential recommendation consequently has seen considerable attention in current literature, where state of the art approaches focus on self-attentive models leveraging contextual information such as long and short-term user history and item features; however, most of these studies focus on long-form content domains (retail, movie, etc.) rather than short-form, such as music. Additionally, many do not explore incorporating negative session-level feedback during training. In this study, we investigate the use of transformer-based self-attentive architectures to learn implicit session-level information for sequential music recommendation. We additionally propose a contrastive learning task to incorporate negative feedback (e.g skipped tracks) to promote positive hits and penalize negative hits. This task is formulated as a simple loss term that can be incorporated into a variety of deep learning architectures for sequential recommendation. Our experiments show that this results in consistent performance gains over the baseline architectures ignoring negative user feedback.
    A survey on the semantics of sequential patterns with negation. (arXiv:2309.11638v1 [cs.LG])
    A sequential pattern with negation, or negative sequential pattern, takes the form of a sequential pattern for which the negation symbol may be used in front of some of the pattern's itemsets. Intuitively, such a pattern occurs in a sequence if negated itemsets are absent in the sequence. Recent work has shown that different semantics can be attributed to these pattern forms, and that state-of-the-art algorithms do not extract the same sets of patterns. This raises the important question of the interpretability of sequential pattern with negation. In this study, our focus is on exploring how potential users perceive negation in sequential patterns. Our aim is to determine whether specific semantics are more "intuitive" than others and whether these align with the semantics employed by one or more state-of-the-art algorithms. To achieve this, we designed a questionnaire to reveal the semantics' intuition of each user. This article presents both the design of the questionnaire and an in-depth analysis of the 124 responses obtained. The outcomes indicate that two of the semantics are predominantly intuitive; however, neither of them aligns with the semantics of the primary state-of-the-art algorithms. As a result, we provide recommendations to account for this disparity in the conclusions drawn.
    The Broad Impact of Feature Imitation: Neural Enhancements Across Financial, Speech, and Physiological Domains. (arXiv:2309.12279v1 [cs.LG])
    Initialization of neural network weights plays a pivotal role in determining their performance. Feature Imitating Networks (FINs) offer a novel strategy by initializing weights to approximate specific closed-form statistical features, setting a promising foundation for deep learning architectures. While the applicability of FINs has been chiefly tested in biomedical domains, this study extends its exploration into other time series datasets. Three different experiments are conducted in this study to test the applicability of imitating Tsallis entropy for performance enhancement: Bitcoin price prediction, speech emotion recognition, and chronic neck pain detection. For the Bitcoin price prediction, models embedded with FINs reduced the root mean square error by around 1000 compared to the baseline. In the speech emotion recognition task, the FIN-augmented model increased classification accuracy by over 3 percent. Lastly, in the CNP detection experiment, an improvement of about 7 percent was observed compared to established classifiers. These findings validate the broad utility and potency of FINs in diverse applications.  ( 2 min )
    Turaco: Complexity-Guided Data Sampling for Training Neural Surrogates of Programs. (arXiv:2309.11726v1 [cs.PL])
    Programmers and researchers are increasingly developing surrogates of programs, models of a subset of the observable behavior of a given program, to solve a variety of software development challenges. Programmers train surrogates from measurements of the behavior of a program on a dataset of input examples. A key challenge of surrogate construction is determining what training data to use to train a surrogate of a given program. We present a methodology for sampling datasets to train neural-network-based surrogates of programs. We first characterize the proportion of data to sample from each region of a program's input space (corresponding to different execution paths of the program) based on the complexity of learning a surrogate of the corresponding execution path. We next provide a program analysis to determine the complexity of different paths in a program. We evaluate these results on a range of real-world programs, demonstrating that complexity-guided sampling results in empirical improvements in accuracy.  ( 2 min )
    How Robust is Google's Bard to Adversarial Image Attacks?. (arXiv:2309.11751v1 [cs.CV])
    Multimodal Large Language Models (MLLMs) that integrate text and other modalities (especially vision) have achieved unprecedented performance in various multimodal tasks. However, due to the unsolved adversarial robustness problem of vision models, MLLMs can have more severe safety and security risks by introducing the vision inputs. In this work, we study the adversarial robustness of Google's Bard, a competitive chatbot to ChatGPT that released its multimodal capability recently, to better understand the vulnerabilities of commercial MLLMs. By attacking white-box surrogate vision encoders or MLLMs, the generated adversarial examples can mislead Bard to output wrong image descriptions with a 22% success rate based solely on the transferability. We show that the adversarial examples can also attack other MLLMs, e.g., a 26% attack success rate against Bing Chat and a 86% attack success rate against ERNIE bot. Moreover, we identify two defense mechanisms of Bard, including face detection and toxicity detection of images. We design corresponding attacks to evade these defenses, demonstrating that the current defenses of Bard are also vulnerable. We hope this work can deepen our understanding on the robustness of MLLMs and facilitate future research on defenses. Our code is available at https://github.com/thu-ml/Attack-Bard.  ( 2 min )
    A Study of Forward-Forward Algorithm for Self-Supervised Learning. (arXiv:2309.11955v1 [cs.CV])
    Self-supervised representation learning has seen remarkable progress in the last few years, with some of the recent methods being able to learn useful image representations without labels. These methods are trained using backpropagation, the de facto standard. Recently, Geoffrey Hinton proposed the forward-forward algorithm as an alternative training method. It utilizes two forward passes and a separate loss function for each layer to train the network without backpropagation. In this study, for the first time, we study the performance of forward-forward vs. backpropagation for self-supervised representation learning and provide insights into the learned representation spaces. Our benchmark employs four standard datasets, namely MNIST, F-MNIST, SVHN and CIFAR-10, and three commonly used self-supervised representation learning techniques, namely rotation, flip and jigsaw. Our main finding is that while the forward-forward algorithm performs comparably to backpropagation during (self-)supervised training, the transfer performance is significantly lagging behind in all the studied settings. This may be caused by a combination of factors, including having a loss function for each layer and the way the supervised training is realized in the forward-forward paradigm. In comparison to backpropagation, the forward-forward algorithm focuses more on the boundaries and drops part of the information unnecessary for making decisions which harms the representation learning goal. Further investigation and research are necessary to stabilize the forward-forward strategy for self-supervised learning, to work beyond the datasets and configurations demonstrated by Geoffrey Hinton.  ( 2 min )
    Using causal inference to avoid fallouts in data-driven parametric analysis: a case study in the architecture, engineering, and construction industry. (arXiv:2309.11509v1 [cs.CE])
    The decision-making process in real-world implementations has been affected by a growing reliance on data-driven models. We investigated the synergetic pattern between the data-driven methods, empirical domain knowledge, and first-principles simulations. We showed the potential risk of biased results when using data-driven models without causal analysis. Using a case study assessing the implication of several design solutions on the energy consumption of a building, we proved the necessity of causal analysis during the data-driven modeling process. We concluded that: (a) Data-driven models' accuracy assessment or domain knowledge screening may not rule out biased and spurious results; (b) Data-driven models' feature selection should involve careful consideration of causal relationships, especially colliders; (c) Causal analysis results can be used as an aid to first-principles simulation design and parameter checking to avoid cognitive biases. We proved the benefits of causal analysis when applied to data-driven models in building engineering.  ( 2 min )
    Passage Summarization with Recurrent Models for Audio-Sheet Music Retrieval. (arXiv:2309.12111v1 [cs.SD])
    Many applications of cross-modal music retrieval are related to connecting sheet music images to audio recordings. A typical and recent approach to this is to learn, via deep neural networks, a joint embedding space that correlates short fixed-size snippets of audio and sheet music by means of an appropriate similarity structure. However, two challenges that arise out of this strategy are the requirement of strongly aligned data to train the networks, and the inherent discrepancies of musical content between audio and sheet music snippets caused by local and global tempo differences. In this paper, we address these two shortcomings by designing a cross-modal recurrent network that learns joint embeddings that can summarize longer passages of corresponding audio and sheet music. The benefits of our method are that it only requires weakly aligned audio-sheet music pairs, as well as that the recurrent network handles the non-linearities caused by tempo variations between audio and sheet music. We conduct a number of experiments on synthetic and real piano data and scores, showing that our proposed recurrent method leads to more accurate retrieval in all possible configurations.  ( 2 min )
    TrueLearn: A Python Library for Personalised Informational Recommendations with (Implicit) Feedback. (arXiv:2309.11527v1 [cs.IR])
    This work describes the TrueLearn Python library, which contains a family of online learning Bayesian models for building educational (or more generally, informational) recommendation systems. This family of models was designed following the "open learner" concept, using humanly-intuitive user representations. For the sake of interpretability and putting the user in control, the TrueLearn library also contains different representations to help end-users visualise the learner models, which may in the future facilitate user interaction with their own models. Together with the library, we include a previously publicly released implicit feedback educational dataset with evaluation metrics to measure the performance of the models. The extensive documentation and coding examples make the library highly accessible to both machine learning developers and educational data mining and learning analytic practitioners. The library and the support documentation with examples are available at https://truelearn.readthedocs.io/en/latest.  ( 2 min )
    Activation Compression of Graph Neural Networks using Block-wise Quantization with Improved Variance Minimization. (arXiv:2309.11856v1 [stat.ML])
    Efficient training of large-scale graph neural networks (GNNs) has been studied with a specific focus on reducing their memory consumption. Work by Liu et al. (2022) proposed extreme activation compression (EXACT) which demonstrated drastic reduction in memory consumption by performing quantization of the intermediate activation maps down to using INT2 precision. They showed little to no reduction in performance while achieving large reductions in GPU memory consumption. In this work, we present an improvement to the EXACT strategy by using block-wise quantization of the intermediate activation maps. We experimentally analyze different block sizes and show further reduction in memory consumption (>15%), and runtime speedup per epoch (about 5%) even when performing extreme extents of quantization with similar performance trade-offs as with the original EXACT. Further, we present a correction to the assumptions on the distribution of intermediate activation maps in EXACT (assumed to be uniform) and show improved variance estimations of the quantization and dequantization steps.  ( 2 min )
    Orbital AI-based Autonomous Refuelling Solution. (arXiv:2309.11648v1 [cs.CV])
    Cameras are rapidly becoming the choice for on-board sensors towards space rendezvous due to their small form factor and inexpensive power, mass, and volume costs. When it comes to docking, however, they typically serve a secondary role, whereas the main work is done by active sensors such as lidar. This paper documents the development of a proposed AI-based (artificial intelligence) navigation algorithm intending to mature the use of on-board visible wavelength cameras as a main sensor for docking and on-orbit servicing (OOS), reducing the dependency on lidar and greatly reducing costs. Specifically, the use of AI enables the expansion of the relative navigation solution towards multiple classes of scenarios, e.g., in terms of targets or illumination conditions, which would otherwise have to be crafted on a case-by-case manner using classical image processing methods. Multiple convolutional neural network (CNN) backbone architectures are benchmarked on synthetically generated data of docking manoeuvres with the International Space Station (ISS), achieving position and attitude estimates close to 1% range-normalised and 1 deg, respectively. The integration of the solution with a physical prototype of the refuelling mechanism is validated in laboratory using a robotic arm to simulate a berthing procedure.  ( 2 min )
    Popularity Degradation Bias in Local Music Recommendation. (arXiv:2309.11671v1 [cs.IR])
    In this paper, we study the effect of popularity degradation bias in the context of local music recommendations. Specifically, we examine how accurate two top-performing recommendation algorithms, Weight Relevance Matrix Factorization (WRMF) and Multinomial Variational Autoencoder (Mult-VAE), are at recommending artists as a function of artist popularity. We find that both algorithms improve recommendation performance for more popular artists and, as such, exhibit popularity degradation bias. While both algorithms produce a similar level of performance for more popular artists, Mult-VAE shows better relative performance for less popular artists. This suggests that this algorithm should be preferred for local (long-tail) music artist recommendation.  ( 2 min )
    Multidimensional well-being of US households at a fine spatial scale using fused household surveys: fusionACS. (arXiv:2309.11512v1 [stat.AP])
    Social science often relies on surveys of households and individuals. Dozens of such surveys are regularly administered by the U.S. government. However, they field independent, unconnected samples with specialized questions, limiting research questions to those that can be answered by a single survey. The fusionACS project seeks to integrate data from multiple U.S. household surveys by statistically "fusing" variables from "donor" surveys onto American Community Survey (ACS) microdata. This results in an integrated microdataset of household attributes and well-being dimensions that can be analyzed to address research questions in ways that are not currently possible. The presented data comprise the fusion onto the ACS of select donor variables from the Residential Energy Consumption Survey (RECS) of 2015, the National Household Transportation Survey (NHTS) of 2017, the American Housing Survey (AHS) of 2019, and the Consumer Expenditure Survey - Interview (CEI) for the years 2015-2019. The underlying statistical techniques are included in an open-source $R$ package, fusionModel, that provides generic tools for the creation, analysis, and validation of fused microdata.  ( 2 min )
    Dr. FERMI: A Stochastic Distributionally Robust Fair Empirical Risk Minimization Framework. (arXiv:2309.11682v1 [cs.LG])
    While training fair machine learning models has been studied extensively in recent years, most developed methods rely on the assumption that the training and test data have similar distributions. In the presence of distribution shifts, fair models may behave unfairly on test data. There have been some developments for fair learning robust to distribution shifts to address this shortcoming. However, most proposed solutions are based on the assumption of having access to the causal graph describing the interaction of different features. Moreover, existing algorithms require full access to data and cannot be used when small batches are used (stochastic/batch implementation). This paper proposes the first stochastic distributionally robust fairness framework with convergence guarantees that do not require knowledge of the causal graph. More specifically, we formulate the fair inference in the presence of the distribution shift as a distributionally robust optimization problem under $L_p$ norm uncertainty sets with respect to the Exponential Renyi Mutual Information (ERMI) as the measure of fairness violation. We then discuss how the proposed method can be implemented in a stochastic fashion. We have evaluated the presented framework's performance and efficiency through extensive experiments on real datasets consisting of distribution shifts.  ( 2 min )
    Ad-load Balancing via Off-policy Learning in a Content Marketplace. (arXiv:2309.11518v1 [cs.IR])
    Ad-load balancing is a critical challenge in online advertising systems, particularly in the context of social media platforms, where the goal is to maximize user engagement and revenue while maintaining a satisfactory user experience. This requires the optimization of conflicting objectives, such as user satisfaction and ads revenue. Traditional approaches to ad-load balancing rely on static allocation policies, which fail to adapt to changing user preferences and contextual factors. In this paper, we present an approach that leverages off-policy learning and evaluation from logged bandit feedback. We start by presenting a motivating analysis of the ad-load balancing problem, highlighting the conflicting objectives between user satisfaction and ads revenue. We emphasize the nuances that arise due to user heterogeneity and the dependence on the user's position within a session. Based on this analysis, we define the problem as determining the optimal ad-load for a particular feed fetch. To tackle this problem, we propose an off-policy learning framework that leverages unbiased estimators such as Inverse Propensity Scoring (IPS) and Doubly Robust (DR) to learn and estimate the policy values using offline collected stochastic data. We present insights from online A/B experiments deployed at scale across over 80 million users generating over 200 million sessions, where we find statistically significant improvements in both user satisfaction metrics and ads revenue for the platform.  ( 2 min )
    Towards Differential Privacy in Sequential Recommendation: A Noisy Graph Neural Network Approach. (arXiv:2309.11515v1 [cs.CR])
    With increasing frequency of high-profile privacy breaches in various online platforms, users are becoming more concerned about their privacy. And recommender system is the core component of online platforms for providing personalized service, consequently, its privacy preservation has attracted great attention. As the gold standard of privacy protection, differential privacy has been widely adopted to preserve privacy in recommender systems. However, existing differentially private recommender systems only consider static and independent interactions, so they cannot apply to sequential recommendation where behaviors are dynamic and dependent. Meanwhile, little attention has been paid on the privacy risk of sensitive user features, most of them only protect user feedbacks. In this work, we propose a novel DIfferentially Private Sequential recommendation framework with a noisy Graph Neural Network approach (denoted as DIPSGNN) to address these limitations. To the best of our knowledge, we are the first to achieve differential privacy in sequential recommendation with dependent interactions. Specifically, in DIPSGNN, we first leverage piecewise mechanism to protect sensitive user features. Then, we innovatively add calibrated noise into aggregation step of graph neural network based on aggregation perturbation mechanism. And this noisy graph neural network can protect sequentially dependent interactions and capture user preferences simultaneously. Extensive experiments demonstrate the superiority of our method over state-of-the-art differentially private recommender systems in terms of better balance between privacy and accuracy.  ( 2 min )
    Latent Diffusion Models for Structural Component Design. (arXiv:2309.11601v1 [cs.LG])
    Recent advances in generative modeling, namely Diffusion models, have revolutionized generative modeling, enabling high-quality image generation tailored to user needs. This paper proposes a framework for the generative design of structural components. Specifically, we employ a Latent Diffusion model to generate potential designs of a component that can satisfy a set of problem-specific loading conditions. One of the distinct advantages our approach offers over other generative approaches, such as generative adversarial networks (GANs), is that it permits the editing of existing designs. We train our model using a dataset of geometries obtained from structural topology optimization utilizing the SIMP algorithm. Consequently, our framework generates inherently near-optimal designs. Our work presents quantitative results that support the structural performance of the generated designs and the variability in potential candidate designs. Furthermore, we provide evidence of the scalability of our framework by operating over voxel domains with resolutions varying from $32^3$ to $128^3$. Our framework can be used as a starting point for generating novel near-optimal designs similar to topology-optimized designs.  ( 2 min )
  • Open

    Activation Compression of Graph Neural Networks using Block-wise Quantization with Improved Variance Minimization. (arXiv:2309.11856v1 [stat.ML])
    Efficient training of large-scale graph neural networks (GNNs) has been studied with a specific focus on reducing their memory consumption. Work by Liu et al. (2022) proposed extreme activation compression (EXACT) which demonstrated drastic reduction in memory consumption by performing quantization of the intermediate activation maps down to using INT2 precision. They showed little to no reduction in performance while achieving large reductions in GPU memory consumption. In this work, we present an improvement to the EXACT strategy by using block-wise quantization of the intermediate activation maps. We experimentally analyze different block sizes and show further reduction in memory consumption (>15%), and runtime speedup per epoch (about 5%) even when performing extreme extents of quantization with similar performance trade-offs as with the original EXACT. Further, we present a correction to the assumptions on the distribution of intermediate activation maps in EXACT (assumed to be uniform) and show improved variance estimations of the quantization and dequantization steps.  ( 2 min )
    On the Probability of Immunity. (arXiv:2309.11942v1 [stat.ME])
    This work is devoted to the study of the probability of immunity, i.e. the effect occurs whether exposed or not. We derive necessary and sufficient conditions for non-immunity and $\epsilon$-bounded immunity, i.e. the probability of immunity is zero and $\epsilon$-bounded, respectively. The former allows us to estimate the probability of benefit (i.e., the effect occurs if and only if exposed) from a randomized controlled trial, and the latter allows us to produce bounds of the probability of benefit that are tighter than the existing ones. We also introduce the concept of indirect immunity (i.e., through a mediator) and repeat our previous analysis for it. Finally, we propose a method for sensitivity analysis of the probability of immunity under unmeasured confounding.
    Grassmann Manifold Flows for Stable Shape Generation. (arXiv:2211.02900v2 [cs.LG] UPDATED)
    Recently, studies on machine learning have focused on methods that use symmetry implicit in a specific manifold as an inductive bias. Grassmann manifolds provide the ability to handle fundamental shapes represented as shape spaces, enabling stable shape analysis. In this paper, we present a novel approach in which we establish the theoretical foundations for learning distributions on the Grassmann manifold via continuous normalization flows, with the explicit goal of generating stable shapes. Our approach facilitates more robust generation by effectively eliminating the influence of extraneous transformations, such as rotations and inversions, through learning and generating within a Grassmann manifolds designed to accommodate the essential shape information of the object. The experimental results indicated that the proposed method can generate high-quality samples by capturing the data structure. Furthermore, the proposed method significantly outperformed state-of-the-art methods in terms of the log-likelihood or evidence lower bound. The results obtained are expected to stimulate further research in this field, leading to advances for stable shape generation and analysis.
    E-detectors: a nonparametric framework for sequential change detection. (arXiv:2203.03532v3 [stat.ME] UPDATED)
    Sequential change detection is a classical problem with a variety of applications. However, the majority of prior work has been parametric, for example, focusing on exponential families. We develop a fundamentally new and general framework for sequential change detection when the pre- and post-change distributions are nonparametrically specified (and thus composite). Our procedures come with clean, nonasymptotic bounds on the average run length (frequency of false alarms). In certain nonparametric cases (like sub-Gaussian or sub-exponential), we also provide near-optimal bounds on the detection delay following a changepoint. The primary technical tool that we introduce is called an \emph{e-detector}, which is composed of sums of e-processes -- a fundamental generalization of nonnegative supermartingales -- that are started at consecutive times. We first introduce simple Shiryaev-Roberts and CUSUM-style e-detectors, and then show how to design their mixtures in order to achieve both statistical and computational efficiency. Our e-detector framework can be instantiated to recover classical likelihood-based procedures for parametric problems, as well as yielding the first change detection method for many nonparametric problems. As a running example, we tackle the problem of detecting changes in the mean of a bounded random variable without i.i.d. assumptions, with an application to tracking the performance of a basketball team over multiple seasons.
    Model-based Clustering using Non-parametric Hidden Markov Models. (arXiv:2309.12238v1 [math.ST])
    Thanks to their dependency structure, non-parametric Hidden Markov Models (HMMs) are able to handle model-based clustering without specifying group distributions. The aim of this work is to study the Bayes risk of clustering when using HMMs and to propose associated clustering procedures. We first give a result linking the Bayes risk of classification and the Bayes risk of clustering, which we use to identify the key quantity determining the difficulty of the clustering task. We also give a proof of this result in the i.i.d. framework, which might be of independent interest. Then we study the excess risk of the plugin classifier. All these results are shown to remain valid in the online setting where observations are clustered sequentially. Simulations illustrate our findings.
    Multiclass Learnability Does Not Imply Sample Compression. (arXiv:2308.06424v2 [cs.LG] UPDATED)
    A hypothesis class admits a sample compression scheme, if for every sample labeled by a hypothesis from the class, it is possible to retain only a small subsample, using which the labels on the entire sample can be inferred. The size of the compression scheme is an upper bound on the size of the subsample produced. Every learnable binary hypothesis class (which must necessarily have finite VC dimension) admits a sample compression scheme of size only a finite function of its VC dimension, independent of the sample size. For multiclass hypothesis classes, the analog of VC dimension is the DS dimension. We show that the analogous statement pertaining to sample compression is not true for multiclass hypothesis classes: every learnable multiclass hypothesis class, which must necessarily have finite DS dimension, does not admit a sample compression scheme of size only a finite function of its DS dimension.  ( 2 min )
    Statistical Complexity of Quantum Learning. (arXiv:2309.11617v1 [quant-ph])
    Recent years have seen significant activity on the problem of using data for the purpose of learning properties of quantum systems or of processing classical or quantum data via quantum computing. As in classical learning, quantum learning problems involve settings in which the mechanism generating the data is unknown, and the main goal of a learning algorithm is to ensure satisfactory accuracy levels when only given access to data and, possibly, side information such as expert knowledge. This article reviews the complexity of quantum learning using information-theoretic techniques by focusing on data complexity, copy complexity, and model complexity. Copy complexity arises from the destructive nature of quantum measurements, which irreversibly alter the state to be processed, limiting the information that can be extracted about quantum data. For example, in a quantum system, unlike in classical machine learning, it is generally not possible to evaluate the training loss simultaneously on multiple hypotheses using the same quantum data. To make the paper self-contained and approachable by different research communities, we provide extensive background material on classical results from statistical learning theory, as well as on the distinguishability of quantum states. Throughout, we highlight the differences between quantum and classical learning by addressing both supervised and unsupervised learning, and we provide extensive pointers to the literature.
    Online Self-Concordant and Relatively Smooth Minimization, With Applications to Online Portfolio Selection and Learning Quantum States. (arXiv:2210.00997v3 [stat.ML] UPDATED)
    Consider an online convex optimization problem where the loss functions are self-concordant barriers, smooth relative to a convex function $h$, and possibly non-Lipschitz. We analyze the regret of online mirror descent with $h$. Then, based on the result, we prove the following in a unified manner. Denote by $T$ the time horizon and $d$ the parameter dimension. 1. For online portfolio selection, the regret of $\widetilde{\text{EG}}$, a variant of exponentiated gradient due to Helmbold et al., is $\tilde{O} ( T^{2/3} d^{1/3} )$ when $T > 4 d / \log d$. This improves on the original $\tilde{O} ( T^{3/4} d^{1/2} )$ regret bound for $\widetilde{\text{EG}}$. 2. For online portfolio selection, the regret of online mirror descent with the logarithmic barrier is $\tilde{O}(\sqrt{T d})$. The regret bound is the same as that of Soft-Bayes due to Orseau et al. up to logarithmic terms. 3. For online learning quantum states with the logarithmic loss, the regret of online mirror descent with the log-determinant function is also $\tilde{O} ( \sqrt{T d} )$. Its per-iteration time is shorter than all existing algorithms we know.
    Proportional Response: Contextual Bandits for Simple and Cumulative Regret Minimization. (arXiv:2307.02108v2 [cs.LG] UPDATED)
    Simple regret minimization is a critical problem in learning optimal treatment assignment policies across various domains, including healthcare and e-commerce. However, it remains understudied in the contextual bandit setting. We propose a new family of computationally efficient bandit algorithms for the stochastic contextual bandit settings, with the flexibility to be adapted for cumulative regret minimization (with near-optimal minimax guarantees) and simple regret minimization (with SOTA guarantees). Furthermore, our algorithms adapt to model misspecification and extend to the continuous arm settings. These advantages come from constructing and relying on "conformal arm sets" (CASs), which provide a set of arms at every context that encompass the context-specific optimal arm with some probability across the context distribution. Our positive results on simple and cumulative regret guarantees are contrasted by a negative result, which shows that an algorithm can't achieve instance-dependent simple regret guarantees while simultaneously achieving minimax optimal cumulative regret guarantees.  ( 2 min )
    fakenewsbr: A Fake News Detection Platform for Brazilian Portuguese. (arXiv:2309.11052v2 [cs.CL] UPDATED)
    The proliferation of fake news has become a significant concern in recent times due to its potential to spread misinformation and manipulate public opinion. This paper presents a comprehensive study on detecting fake news in Brazilian Portuguese, focusing on journalistic-type news. We propose a machine learning-based approach that leverages natural language processing techniques, including TF-IDF and Word2Vec, to extract features from textual data. We evaluate the performance of various classification algorithms, such as logistic regression, support vector machine, random forest, AdaBoost, and LightGBM, on a dataset containing both true and fake news articles. The proposed approach achieves high accuracy and F1-Score, demonstrating its effectiveness in identifying fake news. Additionally, we developed a user-friendly web platform, fakenewsbr.com, to facilitate the verification of news articles' veracity. Our platform provides real-time analysis, allowing users to assess the likelihood of fake news articles. Through empirical analysis and comparative studies, we demonstrate the potential of our approach to contribute to the fight against the spread of fake news and promote more informed media consumption.
    Human-in-the-Loop Causal Discovery under Latent Confounding using Ancestral GFlowNets. (arXiv:2309.12032v1 [cs.LG])
    Structure learning is the crux of causal inference. Notably, causal discovery (CD) algorithms are brittle when data is scarce, possibly inferring imprecise causal relations that contradict expert knowledge -- especially when considering latent confounders. To aggravate the issue, most CD methods do not provide uncertainty estimates, making it hard for users to interpret results and improve the inference process. Surprisingly, while CD is a human-centered affair, no works have focused on building methods that both 1) output uncertainty estimates that can be verified by experts and 2) interact with those experts to iteratively refine CD. To solve these issues, we start by proposing to sample (causal) ancestral graphs proportionally to a belief distribution based on a score function, such as the Bayesian information criterion (BIC), using generative flow networks. Then, we leverage the diversity in candidate graphs and introduce an optimal experimental design to iteratively probe the expert about the relations among variables, effectively reducing the uncertainty of our belief over ancestral graphs. Finally, we update our samples to incorporate human feedback via importance sampling. Importantly, our method does not require causal sufficiency (i.e., unobserved confounders may exist). Experiments with synthetic observational data show that our method can accurately sample from distributions over ancestral graphs and that we can greatly improve inference quality with human aid.  ( 2 min )
    TrueLearn: A Python Library for Personalised Informational Recommendations with (Implicit) Feedback. (arXiv:2309.11527v1 [cs.IR])
    This work describes the TrueLearn Python library, which contains a family of online learning Bayesian models for building educational (or more generally, informational) recommendation systems. This family of models was designed following the "open learner" concept, using humanly-intuitive user representations. For the sake of interpretability and putting the user in control, the TrueLearn library also contains different representations to help end-users visualise the learner models, which may in the future facilitate user interaction with their own models. Together with the library, we include a previously publicly released implicit feedback educational dataset with evaluation metrics to measure the performance of the models. The extensive documentation and coding examples make the library highly accessible to both machine learning developers and educational data mining and learning analytic practitioners. The library and the support documentation with examples are available at https://truelearn.readthedocs.io/en/latest.
    Bayesian sparsification for deep neural networks with Bayesian model reduction. (arXiv:2309.12095v1 [stat.ML])
    Deep learning's immense capabilities are often constrained by the complexity of its models, leading to an increasing demand for effective sparsification techniques. Bayesian sparsification for deep learning emerges as a crucial approach, facilitating the design of models that are both computationally efficient and competitive in terms of performance across various deep learning applications. The state-of-the-art -- in Bayesian sparsification of deep neural networks -- combines structural shrinkage priors on model weights with an approximate inference scheme based on black-box stochastic variational inference. However, model inversion of the full generative model is exceptionally computationally demanding, especially when compared to standard deep learning of point estimates. In this context, we advocate for the use of Bayesian model reduction (BMR) as a more efficient alternative for pruning of model weights. As a generalization of the Savage-Dickey ratio, BMR allows a post-hoc elimination of redundant model weights based on the posterior estimates under a straightforward (non-hierarchical) generative model. Our comparative study highlights the computational efficiency and the pruning rate of the BMR method relative to the established stochastic variational inference (SVI) scheme, when applied to the full hierarchical generative model. We illustrate the potential of BMR to prune model parameters across various deep learning architectures, from classical networks like LeNet to modern frameworks such as Vision Transformers and MLP-Mixers.
    Dr. FERMI: A Stochastic Distributionally Robust Fair Empirical Risk Minimization Framework. (arXiv:2309.11682v1 [cs.LG])
    While training fair machine learning models has been studied extensively in recent years, most developed methods rely on the assumption that the training and test data have similar distributions. In the presence of distribution shifts, fair models may behave unfairly on test data. There have been some developments for fair learning robust to distribution shifts to address this shortcoming. However, most proposed solutions are based on the assumption of having access to the causal graph describing the interaction of different features. Moreover, existing algorithms require full access to data and cannot be used when small batches are used (stochastic/batch implementation). This paper proposes the first stochastic distributionally robust fairness framework with convergence guarantees that do not require knowledge of the causal graph. More specifically, we formulate the fair inference in the presence of the distribution shift as a distributionally robust optimization problem under $L_p$ norm uncertainty sets with respect to the Exponential Renyi Mutual Information (ERMI) as the measure of fairness violation. We then discuss how the proposed method can be implemented in a stochastic fashion. We have evaluated the presented framework's performance and efficiency through extensive experiments on real datasets consisting of distribution shifts.
    Prodigy: An Expeditiously Adaptive Parameter-Free Learner. (arXiv:2306.06101v2 [cs.LG] UPDATED)
    We consider the problem of estimating the learning rate in adaptive methods, such as Adagrad and Adam. We describe two techniques, Prodigy and Resetting, to provably estimate the distance to the solution $D$, which is needed to set the learning rate optimally. Our techniques are modifications of the D-Adaptation method for learning-rate-free learning. Our methods improve upon the convergence rate of D-Adaptation by a factor of $O(\sqrt{\log(D/d_0)})$, where $d_0$ is the initial estimate of $D$. We test our methods on 12 common logistic-regression benchmark datasets, VGG11 and ResNet-50 training on CIFAR10, ViT training on Imagenet, LSTM training on IWSLT14, DLRM training on Criteo dataset, VarNet on Knee MRI dataset, as well as RoBERTa and GPT transformer training on BookWiki. Our experimental results show that our approaches consistently outperform D-Adaptation and reach test accuracy values close to that of hand-tuned Adam.  ( 2 min )
    Quasi-Monte Carlo for 3D Sliced Wasserstein. (arXiv:2309.11713v1 [stat.ML])
    Monte Carlo (MC) approximation has been used as the standard computation approach for the Sliced Wasserstein (SW) distance, which has an intractable expectation in its analytical form. However, the MC method is not optimal in terms of minimizing the absolute approximation error. To provide a better class of empirical SW, we propose quasi-sliced Wasserstein (QSW) approximations that rely on Quasi-Monte Carlo (QMC) methods. For a comprehensive investigation of QMC for SW, we focus on the 3D setting, specifically computing the SW between probability measures in three dimensions. In greater detail, we empirically verify various ways of constructing QMC points sets on the 3D unit-hypersphere, including Gaussian-based mapping, equal area mapping, generalized spiral points, and optimizing discrepancy energies. Furthermore, to obtain an unbiased estimation for stochastic optimization, we extend QSW into Randomized Quasi-Sliced Wasserstein (RQSW) by introducing randomness to the discussed low-discrepancy sequences. For theoretical properties, we prove the asymptotic convergence of QSW and the unbiasedness of RQSW. Finally, we conduct experiments on various 3D tasks, such as point-cloud comparison, point-cloud interpolation, image style transfer, and training deep point-cloud autoencoders, to demonstrate the favorable performance of the proposed QSW and RQSW variants.  ( 2 min )
    GLM Regression with Oblivious Corruptions. (arXiv:2309.11657v1 [cs.DS])
    We demonstrate the first algorithms for the problem of regression for generalized linear models (GLMs) in the presence of additive oblivious noise. We assume we have sample access to examples $(x, y)$ where $y$ is a noisy measurement of $g(w^* \cdot x)$. In particular, \new{the noisy labels are of the form} $y = g(w^* \cdot x) + \xi + \epsilon$, where $\xi$ is the oblivious noise drawn independently of $x$ \new{and satisfies} $\Pr[\xi = 0] \geq o(1)$, and $\epsilon \sim \mathcal N(0, \sigma^2)$. Our goal is to accurately recover a \new{parameter vector $w$ such that the} function $g(w \cdot x)$ \new{has} arbitrarily small error when compared to the true values $g(w^* \cdot x)$, rather than the noisy measurements $y$. We present an algorithm that tackles \new{this} problem in its most general distribution-independent setting, where the solution may not \new{even} be identifiable. \new{Our} algorithm returns \new{an accurate estimate of} the solution if it is identifiable, and otherwise returns a small list of candidates, one of which is close to the true solution. Furthermore, we \new{provide} a necessary and sufficient condition for identifiability, which holds in broad settings. \new{Specifically,} the problem is identifiable when the quantile at which $\xi + \epsilon = 0$ is known, or when the family of hypotheses does not contain candidates that are nearly equal to a translated $g(w^* \cdot x) + A$ for some real number $A$, while also having large error when compared to $g(w^* \cdot x)$. This is the first \new{algorithmic} result for GLM regression \new{with oblivious noise} which can handle more than half the samples being arbitrarily corrupted. Prior work focused largely on the setting of linear regression, and gave algorithms under restrictive assumptions.  ( 3 min )
    On the different regimes of Stochastic Gradient Descent. (arXiv:2309.10688v2 [cs.LG] UPDATED)
    Modern deep networks are trained with stochastic gradient descent (SGD) whose key parameters are the number of data considered at each step or batch size $B$, and the step size or learning rate $\eta$. For small $B$ and large $\eta$, SGD corresponds to a stochastic evolution of the parameters, whose noise amplitude is governed by the `temperature' $T\equiv \eta/B$. Yet this description is observed to break down for sufficiently large batches $B\geq B^*$, or simplifies to gradient descent (GD) when the temperature is sufficiently small. Understanding where these cross-overs take place remains a central challenge. Here we resolve these questions for a teacher-student perceptron classification model, and show empirically that our key predictions still apply to deep networks. Specifically, we obtain a phase diagram in the $B$-$\eta$ plane that separates three dynamical phases: $\textit{(i)}$ a noise-dominated SGD governed by temperature, $\textit{(ii)}$ a large-first-step-dominated SGD and $\textit{(iii)}$ GD. These different phases also corresponds to different regimes of generalization error. Remarkably, our analysis reveals that the batch size $B^*$ separating regimes $\textit{(i)}$ and $\textit{(ii)}$ scale with the size $P$ of the training set, with an exponent that characterizes the hardness of the classification problem.  ( 2 min )
    Regionally Additive Models: Explainable-by-design models minimizing feature interactions. (arXiv:2309.12215v1 [cs.LG])
    Generalized Additive Models (GAMs) are widely used explainable-by-design models in various applications. GAMs assume that the output can be represented as a sum of univariate functions, referred to as components. However, this assumption fails in ML problems where the output depends on multiple features simultaneously. In these cases, GAMs fail to capture the interaction terms of the underlying function, leading to subpar accuracy. To (partially) address this issue, we propose Regionally Additive Models (RAMs), a novel class of explainable-by-design models. RAMs identify subregions within the feature space where interactions are minimized. Within these regions, it is more accurate to express the output as a sum of univariate functions (components). Consequently, RAMs fit one component per subregion of each feature instead of one component per feature. This approach yields a more expressive model compared to GAMs while retaining interpretability. The RAM framework consists of three steps. Firstly, we train a black-box model. Secondly, using Regional Effect Plots, we identify subregions where the black-box model exhibits near-local additivity. Lastly, we fit a GAM component for each identified subregion. We validate the effectiveness of RAMs through experiments on both synthetic and real-world datasets. The results confirm that RAMs offer improved expressiveness compared to GAMs while maintaining interpretability.  ( 2 min )
    Global universal approximation of functional input maps on weighted spaces. (arXiv:2306.03303v2 [stat.ML] UPDATED)
    We introduce so-called functional input neural networks defined on a possibly infinite dimensional weighted space with values also in a possibly infinite dimensional output space. To this end, we use an additive family as hidden layer maps and a non-linear activation function applied to each hidden layer. Relying on Stone-Weierstrass theorems on weighted spaces, we can prove a global universal approximation result for generalizations of continuous functions going beyond the usual approximation on compact sets. This then applies in particular to approximation of (non-anticipative) path space functionals via functional input neural networks. As a further application of the weighted Stone-Weierstrass theorem we prove a global universal approximation result for linear functions of the signature. We also introduce the viewpoint of Gaussian process regression in this setting and show that the reproducing kernel Hilbert space of the signature kernels are Cameron-Martin spaces of certain Gaussian processes. This paves the way towards uncertainty quantification for signature kernel regression.  ( 2 min )
    The Broad Impact of Feature Imitation: Neural Enhancements Across Financial, Speech, and Physiological Domains. (arXiv:2309.12279v1 [cs.LG])
    Initialization of neural network weights plays a pivotal role in determining their performance. Feature Imitating Networks (FINs) offer a novel strategy by initializing weights to approximate specific closed-form statistical features, setting a promising foundation for deep learning architectures. While the applicability of FINs has been chiefly tested in biomedical domains, this study extends its exploration into other time series datasets. Three different experiments are conducted in this study to test the applicability of imitating Tsallis entropy for performance enhancement: Bitcoin price prediction, speech emotion recognition, and chronic neck pain detection. For the Bitcoin price prediction, models embedded with FINs reduced the root mean square error by around 1000 compared to the baseline. In the speech emotion recognition task, the FIN-augmented model increased classification accuracy by over 3 percent. Lastly, in the CNP detection experiment, an improvement of about 7 percent was observed compared to established classifiers. These findings validate the broad utility and potency of FINs in diverse applications.  ( 2 min )

  • Open

    This is an actual barcode created by AI as a piece of art. Scan it for its secret message if you don't believe me..
    submitted by /u/glenniszen [link] [comments]  ( 8 min )
    Issue with voice.ai — no sound comes out after feeding a prerecorded audio clip through the voice filter!
    So I created a voice profile based on recordings of my own voice over on voice.ai. The audio was crisp, loud and clear. I only added about 16 minutes if it though. (A minimum of 15 minutes is recommended) When I take a prerecorded voice clip and feed it through the voice profile I created, it’s just silent for all 15 seconds that it grants you before you upgrade. Then it has the little 2 second audio watermark at the end, where it says “voice AI,” and that I can hear perfectly fine! I’ve searched all over for others who might have had this problem, but it seems like the main problem people have reported difficulties with audio are/were tend to center around the live mode failing to function properly. Has anyone had this issue before? Should I add more audio to the voice profile to make it more well rounded perhaps? If so, how much audio do you typically need to create a solid custom voice profile? Should I just update to the paid version? I don’t really think that would fix it, and I wanted to wait to upgrade until I had some proof that it worked, yanno? I’m not sure what to do… Any ideas? submitted by /u/WhenTheFoxGRINS [link] [comments]  ( 9 min )
    Looking for the best AI Story Generator? Meet NovelGPT.
    submitted by /u/Senior_tasteey [link] [comments]  ( 8 min )
    AI Startup Buzz Is Facing a Reality Check
    Founders and venture capitalists who flocked to artificial-intelligence startups are learning that turning the chatbot buzz into successful businesses is harder than it seems. Source https://www.wsj.com/tech/ai/ai-startup-buzz-is-facing-a-reality-check-e34babfe submitted by /u/NuseAI [link] [comments]  ( 8 min )
    AI — weekly megathread!
    News provided by aibrews.com Genmo releases a new text-to-video model: Genmo Replay v0.1, which generate high-quality videos from text without the need for advanced prompt engineering. Genmo is available for free to create AI videos [Details | Genmo Replay] . OpenAI unveils DALL·E 3 - a major update to the text-to-image model, which will be integrated in ChatGPT. It will be available to ChatGPT Plus and Enterprise users in October, via the API and in Labs later this fall. Creators can now also opt their images out from future training [Details]. Toyota Research Institute has developed a technique, powered by generative AI, that enables teaching robots new manipulation abilities in a single afternoon. Using the same robot, same code, and same setup, TRI taught over 60 different dextero…  ( 11 min )
    LongLoRA: New method extends LLAMA2 7B to 100k context length, 70B to 32k context length on on a single 8 × A100 machine
    As AI models get bigger, training them requires more and more computing power. Researchers are looking for ways to train these large AI models without needing Google-scale resources. A new paper proposes LongLoRA, a fine-tuning approach that can extend LLaMA2 7B to 100k context length and 70B model to 32k context length on a single 8× A100 machine. Here are my highlights from the paper: Big one of course: LongLoRA efficiently fine-tunes large AI models on longer texts Key points: Approximates standard attention via "shift short attention" during training Tuning only a subset of weights (LoRA) plus some embeddings & norms Fine-tuned 7B parameter model on 100k tokens with 1 machine Way lower training cost than full fine-tuning for large contexts Close to full fine-tuning performance The core insight is that an approximation of full attention enables efficient training while retaining standard attention for final inference. Combined with selective weight tuning, this really reduces compute needs. I think this demonstrates the potential to train more capable AI without unreasonable resources. Efficient training techniques = more powerful LLMs for the same resources. Full summary here. submitted by /u/Successful-Western27 [link] [comments]  ( 9 min )
    Information technology industry is at 35% of AI adoption in the US
    It's currently on the 4th place after marketing, consulting, and accounting. And is mostly used in software testing field to: achieve more accurate results have larger test coverage receive low learning curve get faster QA completion I was actually quite surprised when I read, I thought that information technology would at least be in the TOP-3 industries. Do you think it can happen by the end of this year? submitted by /u/unbalanced_mind [link] [comments]  ( 9 min )
    Why ChatGPT isn’t conscious – but future AI systems might be | "Different theories of consciousness suggest some basic properties we might expect a conscious system to have"
    submitted by /u/Tao_Dragon [link] [comments]  ( 8 min )
    want to get GTA 4 navigation voice into Google Maps is it possible using AI?
    hello everyone there was a voice-guided navigation system for GTA 4 with a female voice, I would love to use that as my Google Maps and Waze navigation voice I tried looking for anything online but the mods available are of very inferior quality and from a few years ago which won't work properly with Google Maps as a lot of the voice commands are missing the sample for the voice can be found in these 2 YouTube videos first sample 11 minutes repeated statements 2nd sample 1 minute without repeated statements Using Adobe podcast I can remove a lot of the background noise The problem is I don't know much about sound tuning so using whatever free ai I could find on the internet the voice is not being cloned properly it is sounding mechanical or just like google ttss i have a decen…  ( 10 min )
    DALL-E 3 & ChatGPT: The Game-Changing AI Tool for Text-to-Image Generation
    submitted by /u/Senior_tasteey [link] [comments]  ( 8 min )
    Free Unlimited Face Swap Tool You Can Use in Browser
    submitted by /u/Vegetable_Tutor8245 [link] [comments]  ( 8 min )
    Claude LLM, draft a strong speech for a world leader predicting that by 2030 our AI revolution could make the world so wonderful that we can safely close our prisons.
    Here is a draft speech focusing on the potential for AI to help create a more just and peaceful world by 2030: My fellow citizens, We stand today on the cusp of a new era. The rapid development of artificial intelligence technology presents us with an unparalleled opportunity to remake our world for the better. Though this technology poses risks, I believe that if guided responsibly, AI can help us build a society of prosperity, equity, and justice. Over the past decade, we have already seen glimpses of AI's vast potential for good. Algorithms are helping doctors diagnose diseases more accurately. Smart systems are optimizing transportation networks and reducing traffic accidents. Machine learning is enabling more efficient energy use and sustainable agriculture. But this is only the…  ( 10 min )
  • Open

    [N] Splash Pro drops generative music model and comparison to other models
    Seems like a strong contender in this space, plus comercial use: https://www.splashmusic.com/music-generation submitted by /u/No-Reference8836 [link] [comments]  ( 8 min )
    [R] BTLM-3B-8K: 7B Parameter Performance in a 3B Parameter Model
    Hello Reddit! I'm Daria from the Cerebras ML team, and I've got some thrilling updates to share with you! 🚀 We've recently published our BTLM-3B-8K paper on arXiv, and I’m excited to share that I am one of the authors! Check it out here: https://arxiv.org/abs/2309.11568 It distills our recipe for training SOTA LLMs: An extensively deduplicated dataset: SlimPajama Hyperparameter search using muP Variable sequence length training + ALiBi Aggressive LR decay Our BTLM-3B-8K not only sets a new standard for 3B parameter models but also frequently surpasses the performance of 7B models! In the paper, we meticulously outline how we developed the BTLM model and conducted a comprehensive analysis of its performance on 22 validation benchmarks. These benchmarks span a range of capabilities including common sense reasoning, world knowledge, reading comprehension, code generation, long sequence interpolation/extrapolation, bias, toxicity, and misinformation. For those eager to dive in, we've made our SlimPajama dataset and the BTLM-3B-8K model available on Hugging Face: https://huggingface.co/cerebras 🎉 Feel free to delve into the details, explore the dataset and model, and let us know your thoughts, insights, or questions! We're here to discuss and excited to hear your feedback. Happy exploring! 🚀 submitted by /u/daria-sobol [link] [comments]  ( 9 min )
    [D]: Is There Any Followup To Effect Of Model Size on LoRA Rank "r"?
    Hello all, I am re-reading the LoRA paper (https://arxiv.org/abs/2106.09685) to get a deeper understanding of some of the analysis the authors perform at the end and saw this line Note that the relationship between model size and the optimal rank for adaptation is still an open question. Does anybody know of any resources out there that looked into this question, given that LoRA has been around for a little bit now? Perhaps someone has performed similar subspace overlap / optimal "r" value studies on some of the LLMs that fall in-between GPT2 and 3, i.e. some of the ~7B, ~15B, ~40B and ~70B models? submitted by /u/lightSpeedBrick [link] [comments]  ( 9 min )
    [D] When do we use the instruct version of a LLM?
    If censorship isn’t an issue for me, when there’s an instruct version of an LLM, is it generally always better to use the instruct version than the non-instruct version (because instruct versions tend to hallucinate less)? Apart from censorship and hallucinations, are there any other pros and cons between intrust vs. non-instruct version? submitted by /u/--leockl-- [link] [comments]  ( 9 min )
    [R] How to reduce hallucinations using Chain Of Verification in Large Language Models
    This new paper from Shehzaad Dhuliawala et al. (2023) introduces a combination of prompting and consistency checks made by the LLM itself. Implementing this technique actually made me like gpt-3.5 again ! I wrote a tutorial on how to actually implement this method : https://advanced-stack.com/resources/how-to-reduce-hallucinations-using-chain-of-verification-cov-in-large-language-models.html Let me know if you find it useful ​ submitted by /u/Fluid-Age-9266 [link] [comments]  ( 9 min )
    [D] Looking for suggestions
    Hi guys, a final year CS student here. I want to create a portfolio to showcase my skills in ML and DL. I have knowledge in docker and have access to google cloud platform to deploy. Now, I am unable to find any project that stands out. Could u suggest something that I could learn from as well as looks pretty on my cv? submitted by /u/Virtual_Heron_7417 [link] [comments]  ( 9 min )
    Transformers: I can't fathom the concept of dynamic weights in attention heads [R]
    Hey everyone, I've been diving deep into the world of neural networks, and recently, I've been particularly intrigued by the dynamicity of attention head parameters (weights). These weights play a crucial role in transformers, and understanding how they change during training and inference can provide valuable insights into model behavior. The question is, what does dynamic mean in this context? Is it input-adaptive? Do weights change at inference time according to inputs? I have a hard time understanding this concept, for me, weights are static and pre-established at training time. submitted by /u/assalas23 [link] [comments]  ( 9 min )
    [D] Transformers: I can't fathom the concept of dynamic weights in attention heads
    Hey everyone, I've been diving deep into the world of neural networks, and recently, I've been particularly intrigued by the dynamicity of attention head parameters (weights). These weights play a crucial role in transformers, and understanding how they change during training and inference can provide valuable insights into model behavior. The question is, what does dynamic mean in this context? Is it input-adaptive? Do weights change at inference time according to inputs? I have a hard time understanding this concept, for me, weights are static and pre-established at training time. submitted by /u/assalas23 [link] [comments]  ( 9 min )
    Help Shape the Future of A.I.! Take a Quick 2-minute Survey for Academic Research [R]
    Hello guys! ​ Are you intrigued by the world of Artificial Intelligence? I am conducting a brief and insightful survey on AI for academic research purposes. It'll take just 2 minutes of your time, and your valuable insights will contribute to our understanding of AI trends and perspectives. Whether you're a tech enthusiast, a data wizard, or just curious about the future of AI, your input is incredibly valuable! ​ Click the link below to share your thoughts and help to understand this new technology: ​ https://forms.gle/7fbbkc1f2iBPXHJV7 ​ Thank you in advance for being a part of this exciting AI study! #AIResearch #AI #Artificial Intelligence #SurveyTime submitted by /u/JukeboxNV [link] [comments]  ( 9 min )
    [R] LongLoRA: New method extends LLAMA2 7B to 100k context length, 70B to 32k context length on on a single 8 × A100 machine
    As AI models get bigger, training them requires more and more computing power. Researchers are looking for ways to train these large AI models without needing Google-scale resources. A new paper proposes LongLoRA, a fine-tuning approach that can extend LLaMA2 7B to 100k context length and 70B model to 32k context length on a single 8× A100 machine. Here are my highlights from the paper: Big one of course: LongLoRA efficiently fine-tunes large AI models on longer texts Key points: Approximates standard attention via "shift short attention" during training Tuning only a subset of weights (LoRA) plus some embeddings & norms Fine-tuned 7B parameter model on 100k tokens with 1 machine Way lower training cost than full fine-tuning for large contexts Close to full fine-tuning performance The core insight is that an approximation of full attention enables efficient training while retaining standard attention for final inference. Combined with selective weight tuning, this really reduces compute needs. I think this demonstrates the potential to train more capable AI without unreasonable resources. Efficient training techniques = more powerful LLMs for the same resources. Full summary here. Arxiv paper: https://arxiv.org/pdf/2309.12307.pdf submitted by /u/Successful-Western27 [link] [comments]  ( 9 min )
    [D] Machine learning jobs
    Hello, looking for job opportunities as a data scientist or machine learning engineer. Any help would be appreciated. Thanks!! submitted by /u/ArachnidFun2671 [link] [comments]  ( 8 min )
    [Research] Modelling tabular data with diffusion models (Blog post)
    Denoising diffusion probabilistic models are becoming the leading paradigm of generative modeling for many important data modalities. TabDDPM is a diffusion model that can be universally applied to any tabular dataset and handles any type of feature. Blog post link. submitted by /u/metkere [link] [comments]  ( 9 min )
    [P] Advice needed for what tool/algorithm is appropriate
    Context of the problem: I have the following entities: stations, programs, project manager, days, and time slots. The goal of the problem is to decide what program with who project manager to assign on what day at what time slot for each station. Some notes: Each program has its own duration. Say Program A last for 50 minutes, Program B is 30 minutes, etc. Each project manager has qualifications on what program he/she can handle. This constraint is a hard constraint. Time slots start from 6AM to 6PM. This means that if Program A which lasts for 30 minutes is assigned to start at 6:00AM, then it will end at 6:30AM. Only one program can be assigned in each station, so there should be no overlap in programs in terms of day and time per station. There is a forecasting model that takes a…  ( 10 min )
    [P] Introducing GeoCOCO: Easily transform GIS annotations into Microsoft's Common Objects In Context (COCO) datasets for use in deep learning
    https://github.com/jaspersiebring/geococo Introducing GeoCOCO, an open source project that enables users to turn their GIS annotations (e.g. shapefile) into COCO datasets which can then be used to train computer vision models! It allows users to use the likes of QGIS, ArcGIS to annotate geographic imagery in the same way you would annotate non-geographic imagery with LabelMe. It is powered by Python and a variety of packages (e.g. pydantic, pandera, geopandas, pycocotools). On the more meta side, it also features automated tests/builds/releases through Github Actions (using the likes of poetry, ruff, mypy, pytest, black). Sharing it with you guys in case someone else might find it useful! I am also very interested to hear some feedback (suggestions, flaws, etc.), let me know! ​ Here's…  ( 9 min )
    [D] Finding linkedIn article on anomaly detection
    Finding linkedIn article on anomaly detection Last week I saw a LinkedIn article on anomaly detection. In that post, Tail movement of a running mice on rotating rod ( roller) is captured with CV and plotted as a time series. Then based on the tail movement time series , anomalities in that time series are detected where mice loose the balance in rotating rod. I am trying to find this post in linkedIn but still didnt able get any clue. If you have seen this research article or have any clue please let me know. It will be great help. Appriciate you time and help.Thank You very much ! ❤️ [D] [R] submitted by /u/isurusachitha [link] [comments]  ( 9 min )
    [R] Parallelizing RNN over its sequence length
    I am really excited to share our newest work in deep learning: parallelizing RNN! https://arxiv.org/abs/2309.12252 RNN is thought to be non-parallelizable because of its inherent sequential nature: its state depends on its previous state. This makes training RNN for long sequence usually takes long time compared to other architecture classes (like CNN). What we present is an algorithm based on Newton's method to evaluate and train RNN in parallel. In one of our experiment, we can achieve >1000x faster evaluation of a small GRU compared to common sequential method on a very long sequence. Training RNNs with our algorithm could also be more than 10x faster than training with sequential method. However, it's not without flaws. There are 2 major drawbacks we noticed: non-convergence and sca…  ( 10 min )
    [D] Is running an open sourced LLM in the cloud via GPU generally cheaper than running a closed sourced LLM?
    Assuming using the same cloud service, Is running an open sourced LLM in the cloud via GPU generally cheaper than running a closed sourced LLM? (ie. do we pay a premium when running a closed sourced LLM compared to just running anything on the cloud via GPU?) One eg. I am thinking of is running Llama 2 13b GPTQ in Microsoft Azure vs. GPT-3.5 Turbo. I understand there are a lot of parameters to consider (such as choosing which GPU to use in Microsoft Azure etc.), but I am really looking at what’s the cheapest way to run Llama 2 13b GPTQ or a performance-equivalent closed sourced LLM. submitted by /u/--leockl-- [link] [comments]  ( 9 min )
    [P] State-of-the-art Image-to-Image generators (open-source)
    Hi all. I am curious to know what is cutting edge in this domain? The use-case is creating an image of myself that is older than I am. I've looked a bit into StarGAN but I've never done work in this domain and don't know if this is still used today. Any help would be appreciated! submitted by /u/ProudOwner_of_Fram [link] [comments]  ( 9 min )
    Gradient of Langevin Dynamics Step w.r.t model parameters [D]
    I am reading the following paper about self-supervised learning. ​ https://preview.redd.it/g4bz7wjxarpb1.png?width=1323&format=png&auto=webp&s=9f923825b57d4d3da346bb1bb4dd008366493dbd Briefly their idea for self-supervised learning is to reconstruct a corrupted image (e.g., random masking) using Langevin step of a learned energy function and the reconstructed image is compared to the clean image for supervision. i.e., ​ https://preview.redd.it/q9e1pd9yarpb1.png?width=953&format=png&auto=webp&s=528e654d3ae4db1435aa3c11e5edc787876fd9ac This should work because going along the decreasing energy value ensures that corrupted images have high energy and clean images have low energy. Everything made sense to me until I tried to implement it. ​ In order to optimize the parameters (\theta), we have to take the gradient of the loss w.r.t \theta and by the chain the chain rule we will compute the gradient of the reconstructed x (\tilde x) w.r.t \theta. ​ https://preview.redd.it/j3qbyddzarpb1.png?width=1280&format=png&auto=webp&s=c03d54dbaf70ba588c2a4d87691d71184797ae3f Is this even correct? What am I even talking about I am confused. Anyways ... They provided the following PyTorch pseudocode and I have provided the actual PyTorch code. ​ https://preview.redd.it/m4wd7pq2frpb1.png?width=1266&format=png&auto=webp&s=d5febf258f0c92fc90116a996367b605cd597128 Actually the model parameters never change no matter what what the values of step size (alpha) or the learning rate are. I am missing something? ​ https://preview.redd.it/yznnuws9frpb1.png?width=1185&format=png&auto=webp&s=1ced5a213d372f309e6c20a9781ac35c1abcc436 Any help is appreciated ​ submitted by /u/ThoughtOk5558 [link] [comments]  ( 9 min )
  • Open

    Improving your LLMs with RLHF on Amazon SageMaker
    In this blog post, we illustrate how RLHF can be performed on Amazon SageMaker by conducting an experiment with the popular, open-sourced RLHF repo Trlx. Through our experiment, we demonstrate how RLHF can be used to increase the helpfulness or harmlessness of a large language model using the publicly available Helpfulness and Harmlessness (HH) dataset provided by Anthropic. Using this dataset, we conduct our experiment with Amazon SageMaker Studio notebook that is running on an ml.p4d.24xlarge instance. Finally, we provide a Jupyter notebook to replicate our experiments.  ( 11 min )
  • Open

    Doing graph + tabular analytics directly on modern data lakes
    A podcast with Weimo Liu and Sam Magnus of PuppyGraph Open source Apache Iceberg, Hudi and Delta Lake have made it possible to dispense with the complexities and duplication of data warehousing. Instead of requiring time-consuming extract, transform and load (ETL) procedures, these large table formats make it simple to tap S3 and other repositories… Read More »Doing graph + tabular analytics directly on modern data lakes The post Doing graph + tabular analytics directly on modern data lakes appeared first on Data Science Central.  ( 19 min )
    AI in finance: Addressing hurdles on the path to transformation
    Discover the obstacles hindering seamless AI adoption in financial services and gain actionable insights to navigate regulatory compliance, data security, organizational change, and more. The post AI in finance: Addressing hurdles on the path to transformation appeared first on Data Science Central.  ( 22 min )
  • Open

    Primes, weeds, and military precision
    Here’s a quote from Don Zagier that I found in Larry Rolen’s lecture notes on modular forms. There are two facts about the distribution of prime numbers of which I hope to convince you so overwhelmingly that they will be permanently engraved in your hearts. The first is that, despite their simple definition and role […] Primes, weeds, and military precision first appeared on John D. Cook.  ( 5 min )
    Continued fractions as matrix products
    A continued fraction of the form with n terms can be written as the composition where As discussed in the previous post, a Möbius transformation can be associated with a matrix. And the composition of Möbius transformations is associated with the product of corresponding matrices. So the continued fraction at the top of the post […] Continued fractions as matrix products first appeared on John D. Cook.  ( 5 min )
    Fractional linear and linear
    A function of the form where ad – bc ≠ 0 is sometimes called a fractional linear transformation or a bilinear transformation. I usually use the name Möbius transformation. In what sense are Möbius transformations linear transformations? They’re nonlinear functions unless b = c = 0. And yet they’re analogous to linear transformations. For starters, […] Fractional linear and linear first appeared on John D. Cook.  ( 6 min )
  • Open

    Driving where no Autonomous Vehicle has driven before!
    submitted by /u/shani_786 [link] [comments]  ( 8 min )
    The conference for Reinforcement Learning related with mutli-agent system, game theory, or with others' technicals
    Hi everyone, I'm a newbie for the domain of reinforcement learning (RL). My main research on various software systems, mainly on multi-agent system (MAS). AAMAS, a conference focusing on Multi-agent system, I think is the good conference to study in MAS. Recently, there are many researchers for MAS that is great combination with reinforecement learning to do some interesting things. Also, game theory is a interesting knowledge what I'm curious about how to do within RL and agents. Does any recommended conference or journal for RL? don't mind for just only RL algorithm, that's enough great, whereas I prefer to the conference concerning RL applied some software system for some problem. Thanks all. submitted by /u/DryAir1198 [link] [comments]  ( 9 min )
    help me solve this weird error
    trying to make a frozen lake game but keep getting this weird error and i dont know how to fix it. The error is "too many values to unpack (expected 4)" on the line: next_state, reward, done, _ = env.step(action) import gym import numpy as np import pygame from pygame.locals import QUIT import tensorflow as tf from tensorflow import keras import warnings def ignore_specific_warning(): warnings.filterwarnings("ignore", message="This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.") ​ ignore_specific_warning() ​ input_size = env.observation_space.n output_size = env.action_space.n ​ model = tf.keras.Sequential([ tf.keras.layers.Dense(64,activation='relu',input_shape=(input_size,),use_bias=False), tf.keras.layers.Dense(output_size,use_bias=False)]) ​ loss_fns = tf.keras.losses.MeanSquaredError() optimizer = tf.keras.optimizers.Adamlearning_rate=0.001 ​ discount_factor=0.99 learning_rate=0.1 num_episodes=1000 ​ for i in range(num_episodes): state = env.reset() done = False while not done: env.render() epsilon = 0.1 if np.random.rand() < epsilon: action = env.action_space.sample() else: #Q_values = model.predict(tf.one_hot(state[0],input_size)) #action = np.argmax(Q_values) state_one_hot = tf.one_hot(state[0], input_size) Q_values = model.predict(tf.reshape(state_one_hot, (1, -1))) action = np.argmax(Q_values) next_state, reward, done, _ = env.step(action) target = reward*discount_factor*np.max(model.predict(tf.one_hot(next_state,input_size))) with tf.gradientTape() as tape: Q_values = model(tf.one_hot(next_state,input_size)) loss = loss_fn(Q_values[0][action],target) gradients = tape.gradients(loss,model.trainable_variables) optimizer.apply_gradients(zip(gradients,model.trainable_variables)) state = next_state env.close() submitted by /u/BeastHunterrr [link] [comments]  ( 9 min )
  • Open

    Why Open Source AI Will Win
    submitted by /u/nickb [link] [comments]  ( 8 min )
    Distilling step-by-step: Outperforming larger language models with less training data and smaller model sizes
    submitted by /u/nickb [link] [comments]  ( 8 min )
    Help answering questions about neural networks for my father(retired Computer Science and EE professor) who (for medical reasons) cannot post them himself.
    My father, a retired Computer Science professor and pioneer in network design algorithms, has asked me to post two questions he has as he attempts to teach himself about neural networks. For physical/medical reasons he can’t post them himself. Grateful for any answers or suggestions for where he could find them! His note: I have many questions, but they all revolve around two issues. The first is when the weights in the neural net converge, do they exhibit any recognizable pattern. I realize that when the number of nodes and levels in the net are large it may be impossible to recognize the pattern. But suppose that the problem being examined is a simple categorization and the number of nodes in the net is small, is it possible to see a pattern in the weights? The network is supposed to …  ( 10 min )

  • Open

    I'm making my capstone project at Harvard freely available for everyone (until credit runs out). It uses ChatGPT3/4 in the backend. Experimenting with the finetuned model now.
    submitted by /u/Raymondlkj [link] [comments]  ( 8 min )
    I'm making my capstone project at Harvard freely available for everyone (until credit runs out). It uses ChatGPT3/4 in the backend. Experimenting with the finetuned model now.
    submitted by /u/Raymondlkj [link] [comments]  ( 8 min )
    I'm making my capstone project at Harvard freely available for everyone (until credit runs out). It uses ChatGPT3/4 in the backend. Experimenting with the finetuned model now.
    submitted by /u/Raymondlkj [link] [comments]  ( 8 min )
    I'm making my capstone project at Harvard freely available for everyone (until credit runs out). It uses ChatGPT3/4 in the backend. Experimenting with the finetuned model now.
    submitted by /u/Raymondlkj [link] [comments]  ( 8 min )
    Help bringing some peace to my family.
    I am looking for a way that I can take a bunch of saved voicemails from my mom and be able to hear her voice again. It would mean the world to my family and if my kids could hear her voice I know it would brighten their day. Can anyone point me in the right direction to accomplish this? submitted by /u/blbjtb [link] [comments]  ( 9 min )
    Don’t exclude AI-generated art from copyright
    The US Copyright Office has ruled that AI-generated art cannot be copyrighted, raising questions about whether AI-generated art should be excluded from copyright protection. The Copyright Office argues that AI-generated art is a 'merely mechanical' process with no novelty, invention, or originality, and therefore does not deserve copyright protection. Critics, however, argue that this approach is not scalable and fails to consider the creative choices made by AI artists. They suggest that AI-generated art should be treated similarly to photography, where even though the image is captured mechanically, it still reflects the creative choices of the photographer and is eligible for copyright protection. Photographers are able to own the copyright in their photographs because they make creative judgments about where to point the camera, when to snap the image, and how to adjust camera settings. Similarly, AI artists explore the 'latent space' of images that AI software can produce, making creative judgments about which images to select and explore. While the actual image is produced by the software, the important point is that a human being is making creative decisions about the art. Therefore, critics argue that AI-generated art should be eligible for copyright protection, as it reflects the creative choices and judgments of the AI artist. Source : https://www.understandingai.org/p/dont-exclude-ai-generated-art-from submitted by /u/NuseAI [link] [comments]  ( 9 min )
    Looking for a Free Chatbot Service – Custom Base Prompt and API Access Needed
    I'm looking for a free chatbot service that allows base prompt customization and offers API access (so I can dynamically change the stock on hand to better assist customers, for example). I looked into https://ora.ai, but it seems you can only set the base prompt once and manually. I'm hoping to find a service that allows for more flexibility. The service must also be free and offer embedding, like https://ora.ai/, (with div) submitted by /u/LimeLom1 [link] [comments]  ( 9 min )
    Leading Theory of Consciousness (and why even the most advanced AI can't possess it) Slammed as "Pseudoscience"
    Consciousness theory slammed as ‘pseudoscience’ — sparking uproar (Nature) The irony here is that I mostly agree with this theory - but the article reflects how little we really know about consciousness and how it works, and how what's considered the "expert opinion" that AI can't possess consciousness is arguably influenced more by popularity than real empirical evidence. By whatever mechanism, they can respond to their treatment in unexpectedly humanlike ways. Oh, and by the way, did you think that "sentient Bing" was finally dead? Think again. submitted by /u/kamari2038 [link] [comments]  ( 9 min )
    The AI Revolution is Rotten to the Core
    submitted by /u/Hazzman [link] [comments]  ( 8 min )
    Best VoicifyAi alternatives?
    I was thinking of signing up to VoicifyAi for fun a create some custom covers but are there better (or even free) alternatives? submitted by /u/quantummufasa [link] [comments]  ( 8 min )
    [I read the paper for you] LLMs compress images 43% better than PNG, and audio nearly 2x better than MP3
    Edit: FLAC is the tested audio extension, not MP3 I read the new paper from DeepMind so you don't have to. Here are the key highlights: Despite training on text, langauge models compressed images 43% better than PNG, and audio nearly 2x better than flac. Confirmation of scaling laws - bigger models compressed better. But model size must match dataset size. There are tradeoffs between model scale, data size, and compression performance. More data enables bigger models. Tokenization (like BPE) generally hurts compression slightly by making prediction harder. Longer contexts let models exploit more sequential dependencies. Implications: Models have learned very general capabilities beyond just text. Their strong compression reflects deep understanding of images, audio etc statistically. I got some new perspective on model scaling laws and links between prediction and generalization. There's potential for practical applications compressing images, video etc. But large model size an issue. Overall it shows these models are very capable general purpose learners, not just for language. Full summary here if you want more details. Original paper is here. submitted by /u/Successful-Western27 [link] [comments]  ( 9 min )
    New AI Art Style
    This AI-generated art style is gaining a lot of traction on the internet. So I wanted to share how you can create art like this in under 2 minutes: • Visit the Fusion Art website and sign up for free (https://quickqr.art/app/fusion-art) • Choose a template or upload your reference template image. • Describe your image to tailor the output. • Tweak the aspect ratio and explore added features. • Hit "Generate" That's it! Share your art to showcase what you come up with. Hope this was helpful for anybody looking to create this style of art! https://preview.redd.it/j3o9puk9smpb1.png?width=1536&format=png&auto=webp&s=ddbe1328bd19dc89c4cd82ef1870b9de695e5500 https://preview.redd.it/9ppxjpeasmpb1.png?width=768&format=png&auto=webp&s=29dd845eb5f8f094abe5bc1135060965f51365cf https://preview.redd.it/58vjky3bsmpb1.png?width=1536&format=png&auto=webp&s=06db1926e021ea3f2c438fb723c453758a461c43 https://preview.redd.it/w2nvaoobsmpb1.png?width=1024&format=png&auto=webp&s=0b4bcaea4fc7c832bbce9f7f3b9319099df59a14 submitted by /u/IndifferentSpectat0r [link] [comments]  ( 9 min )
    Now that DALL-E 3 is getting integrated with ChatGPT, will you switch from Midjourney and others?
    submitted by /u/Vinitneo [link] [comments]  ( 8 min )
    I used Riffusion to generate an AI saxophonist to jam with me, responding to what I played on guitar
    submitted by /u/daveNZL [link] [comments]  ( 8 min )
    Mastering Prompt Engineering: Best Prompt Pattern to Use.
    Hello, Reddit! At its core, Prompt Engineering is all about commanding the conversation with AI. Wanna write a poem, draft a project plan, or break down complex scientific terms? The right prompt can do that for you. If you want to learn more about Prompt Engineering feel free to join r/PromptWizards. Let's dive into some Prompt Engineering concepts: Let's start with the basics: Reductive operations: These yield responses that are smaller in size than the input prompt. Essentially, they distill information. Example: If the input prompt is a long article, and the task is to provide a summary, the operation used is reductive as the output (summary) is smaller than the input (the entire article). Transformational operations: These make sure the output maintains a relative balance w…  ( 10 min )
    ChatGPT/Bard/Claude one paragraph challenge: "In one paragraph, what is the best use of AI for humanity?" You be the judge
    ChatGPT Plus: The best use of AI for humanity is to augment human capabilities, address pressing global challenges, and improve the quality of life for all. By analyzing vast amounts of data at unprecedented speeds, AI can aid in medical research, predict and mitigate natural disasters, optimize energy consumption, and personalize education to individual needs. Moreover, it can provide solutions to complex problems, such as climate change, resource scarcity, and public health crises, ensuring a more sustainable and prosperous future for generations to come. Bard: The best use of AI for humanity is to use it to solve problems that will make the world a better place for everyone. AI has the potential to improve our lives in many ways, such as developing new and more effective ways to diag…  ( 10 min )
    What do you think of open-source AI?
    Hugging Face CEO Clem Delangue says open-source AI was vital to starting his company. Now, he finds himself defending open AI models as Washington considers new regulations. On POLITICO Tech, Delangue explains why he views open-source AI as not only safe, but necessary to prevent big tech companies from gaining more market power. Listen for more: https://politico-tech.simplecast.com/episodes/the-hugging-face-case-for-open-ai submitted by /u/smo279 [link] [comments]  ( 9 min )
    No idea which ai to use
    I have a product and I need product photos. It is a bottle and I am wanting ai to make a realistic picture of a woman holding the bottle and smiling. Can this be done? If so, what software/website/app do I use for this? Thank you submitted by /u/Ok_Salt_9211 [link] [comments]  ( 8 min )
    How to use ChatGPT to increase your website conversions
    submitted by /u/Senior_tasteey [link] [comments]  ( 8 min )
    75% of Americans Believe AI Will Reduce Jobs
    75% of Americans believe that AI will decrease the total number of jobs in the next 10 years, according to a survey by Bentley-Gallup Business in Society study. Younger Americans are less pessimistic about AI compared to older age groups. Majorities believe that AI performs as well as or better than humans in customizing online content, recommending products or services, and assisting students with coursework. However, Americans are skeptical about AI's ability to recommend employees, its self-driving capabilities, and its ability to recommend medical advice. 79% of Americans have little trust in businesses to use AI responsibly. 40% of Americans think AI does more harm than good, while only 10% believe it contributes more good than harm. Black and Asian Americans have a more positive view of AI's impact on society compared to Hispanic and White Americans. While most Americans are wary of AI's impact on the job market, younger people are more optimistic about its future. Businesses need to affirm their commitment to using AI responsibly and address the knowledge deficit and lack of confidence among Americans. Source : https://news.gallup.com/opinion/gallup/510635/three-four-americans-believe-reduce-jobs.aspx submitted by /u/NuseAI [link] [comments]  ( 9 min )
    One-Minute Daily AI News 9/20/2023
    OpenAI unveils DALL-E 3, allows artists to opt out of training.[1] Infosys, the Indian tech giant, has announced a strategic partnership with NVIDIA, a leading provider of enterprise-grade AI solutions. The collaboration aims to empower enterprises and businesses with generative AI-based solutions that drive productivity.[2] Alibaba and Tencent Invest in Chinese State-Backed Zhipu AI.[3] John Grisham, George R.R. Martin and more authors sue OpenAI for copyright infringement.[4] Sources: [1] https://techcrunch.com/2023/09/20/openai-unveils-dall-e-3-allows-artists-to-opt-out-of-training/ [2] https://gameishard.gg/news/infosys-and-nvidia-partner-to-deliver-generative-ai-solutions/205456/ [3] https://winbuzzer.com/2023/09/20/alibaba-and-tencent-invest-in-chinese-state-backed-zhipu-ai-xcxwbn/ [4] https://apnews.com/article/openai-lawsuit-authors-grisham-george-rr-martin-37f9073ab67ab25b7e6b2975b2a63bfe submitted by /u/Excellent-Target-847 [link] [comments]  ( 9 min )
    There are no specific license dedicated to artificial intelligence that prevent them from extensively scraping publicly available data on the internet without providing proper source attribution
    When AI takes all the data for training purposes without feedback to UGC platforms, could this eventually lead to the demise of UGC platforms, and could the internet become increasingly closed until it collapses? submitted by /u/oodzchen [link] [comments]  ( 9 min )
    Canva AI Blurred My Image
    Asked the new Canva AI to blur my image. It Blurred my image. submitted by /u/MDINOKC [link] [comments]  ( 8 min )
  • Open

    Naming Awk
    The Awk programming language was named after the initials of its creators. In the preface to a book that just came out, The AWK Programing Language, Second Edition, the authors give a little background on this. Naming a language after its creators shows a certain paucity of imagination. In our defense, we didn’t have a […] Naming Awk first appeared on John D. Cook.  ( 4 min )
  • Open

    [D] What fundamentally prevents training with Volunteer Computing?
    I’d love to see people come together, donate their spare compute to train more open source models, boost research etc. One relevant paper I was able to find is this one https://arxiv.org/abs/2103.08894 submitted by /u/tecbar [link] [comments]  ( 8 min )
    AAAI24 fast track submission [D]
    I'm planning on submitting a rejected NeurIPS paper to the AAAI fast track. A few days ago I registered myself as an author on CMT, but I cannot see an option to submit a paper. Will the portal open later? Can anyone else see an option to submit yet? submitted by /u/Firm-Act-3860 [link] [comments]  ( 9 min )
    [D] Can you go to NeurIPS workshops without presenting?
    This year will be my first NeurIPS, and I see some cool workshops but don't have any work that would be a good fit for them. Can I just go and listen? submitted by /u/ThickBiker [link] [comments]  ( 8 min )
    [D] Motherboard Help
    So a few weeks ago I purchased a Sega 36 Crane Machine (Claw Machine) off ebay and everything worked great except the claw strength being too strong. I took the board out so I could possibly have something added to the board for me. It was missing a “Free Play” button which is used to adjust claw strength on this machine.. Anyway, I kept the board in the garage while searching for the right parts and dad threw it away on accident while cleaning, now im out a board and I have no idea what to do. Can anybody help me or am I sol? Even if im in the wrong community for this, a step to the right direction would even help. I also have attached some pictures of the board.. Im not good with these boards at all and have no clue what im looking for. Theres also a diagram of the boards functions! Thanks for all information/help! submitted by /u/Ready_Highlight9758 [link] [comments]  ( 9 min )
    Open-Source Trading AI [P]
    Hi, I am an experienced trader and coder. I am embarking on a journey to develop an open-source Trading AI in Python, and I'm looking for passionate individuals to join me in this project. This is a non-commercial, community-driven project, so there won't be any monetary compensation involved. However, it's a great opportunity to learn and collaborate in the field of trading AI. ​ - You don't need to know trading to contribute. - As an Open Source project, you have complete freedom to use the AI. ​ About the Project: - Objective: Our goal is to create a self-learning AI system for trading in financial markets. - Tech Stack: We'll be working with Python for data analysis, machine learning, and neural network development. - Data: We'll be using historical stock market data to train a…  ( 9 min )
    [P] MLE RecSys / Ops at DeoVR
    Hi, we're welcoming talented Machine Learning Engineers with a focus on Recommendation Systems and ML Operations. Who are we? Leading in VR video streaming, we drive the DeoVR platform with interactive 8K videos, ML, computer vision, and advanced haptic tech. What you'd do? For MLE RecSys: develop ML pipelines for top recommendations, engage with community for desired features, and collaborate with Backend/Frontend/Unity teams. For ML Ops: construct ML inference infrastructure, refine models & workflows, and ensure high-availability ML services. What we offer? 🌍Remote flexibility 🤝Collaborative and inclusive work environment 🚀Make a significant impact in the VR industry Interested? Check out and apply here or drop me a DM😎 submitted by /u/SanjaVR [link] [comments]  ( 9 min )
    [D] Where to find checkpoints for models (with different seeds) trained on the Imagenet?
    i.e., models with the exact same architecture, but their initial weights are different, and the order which the model sees the training data is different. submitted by /u/just2gud [link] [comments]  ( 8 min )
    [P] BlindChat: Fully in-browser and private Conversational AI with Transformers.js for local inference
    We are happy to share with you BlindChat, the open-source and privacy-by-design alternative to ChatGPT for a fully in-browser, yet private, Conversational AI! You can play with it on our Gradio demo. Our philosophy is that Conversational AI should be easily accessible, and privacy should not be a luxury but a given. By leveraging local models running in the browser, with transformers.js, we make it possible to have a fully transparent and private AI that works on your browser without any extra setup. Because all the logic is offloaded to users’ devices, data never leaves and there is no risk of it being used for finetuning by third parties. For now, we only support inference with LaMini-Flan-T5, so you might see modest performance. We plan to integrate Microsoft phi-1.5 for better performance once the 370M is out. We are also working on LlamaIndex-TS integration on the client side to have in-browser RAG for local querying of private documents. As our roadmap is quite dense, with RAG, internet search, improved inference, we welcome warmly contributors! If you want to contribute, or have questions, ping us on Discord and GitHub! submitted by /u/Separate-Still3770 [link] [comments]  ( 9 min )
    [R] DeepMind: LLMs compress images 43% better than PNG, and audio nearly 2x better than MP3
    Edit: FLAC is the tested audio extension, not MP3 I read the new paper from DeepMind so you don't have to. Here are the key highlights: Despite training on text, langauge models compressed images 43% better than PNG, and audio nearly 2x better than flac. Confirmation of scaling laws - bigger models compressed better. But model size must match dataset size. There are tradeoffs between model scale, data size, and compression performance. More data enables bigger models. Tokenization (like BPE) generally hurts compression slightly by making prediction harder. Longer contexts let models exploit more sequential dependencies. Implications: Models have learned very general capabilities beyond just text. Their strong compression reflects deep understanding of images, audio etc statistically. I got some new perspective on model scaling laws and links between prediction and generalization. There's potential for practical applications compressing images, video etc. But large model size an issue. Overall it shows these models are very capable general purpose learners, not just for language. Full summary here if you want more details. Original paper is here. submitted by /u/Successful-Western27 [link] [comments]  ( 9 min )
    [D] Stanford's AI Professional Program
    Hi, I am interested in taking couple of AI/ML online courses from the Stanford's Artificial Intelligence Professional Program (https://online.stanford.edu/programs/artificial-intelligence-professional-program). I am interested in taking XCS221 and XCS229. My employer would be paying for this (I hope!). I know these are not the complete courses offered in the university. Has anyone taken courses from this program ? Is it worth it ? submitted by /u/RealMadrista007 [link] [comments]  ( 9 min )
    [N] OpenAI's new language model gpt-3.5-turbo-instruct can defeat chess engine Fairy-Stockfish 14 at level 5
    This Twitter thread claims that OpenAI's new language model gpt-3.5-turbo-instruct can "readily" beat Lichess Stockfish level 4. This tweet shows the style of prompts that are being used to get these results with the new language model. I used website parrotchess[dot]com (discovered here) to play multiple games of chess purportedly pitting this new language model vs. various levels of Fairy-Stockfish 14 at website Lichess. My current results for all completed games: The language model is 2-0 vs. Fairy-Stockfish 14 level 5 (game 1, game 2), and 0-2 vs. Fairy-Stockfish 14 level 6 (game 1, game 2). One game I aborted because the language model apparently tried an illegal move. The following is a screenshot from the aforementioned chess web app showing the end state of the first game vs. Fai…  ( 9 min )
    [D] What would be your approach if you were to write a Tree-of-Thoughts model?
    I've been playing around with transformers since following Karpathy's shakespeare model on youtube and found it really cool. Thought I should write a Tree-of-Thoughts model as my next project which will be somewhat similar, given what we're doing is learning the meaning behind words and phrases and then using that knowledge to generate a sequence based on a seed token or sequence of tokens. Here's how I was thinking it should go: I write a next token in sequence prediction model and write a generate function that generates a sequence of tokens on top of a seed token. Then I write a tree model that takes in whatever this bigram has learned and adds onto it some kind of knowledge about where one thought about the seed has been generated and stop there. Then iteratively generate a few more such thoughts based on the same seed. This is how I initially thought it would go but now when I'm writing, I've already written the bigram(next token prediction and generation), I don't know what to do next. I am kind of stuck. It will be interesting to see how you, if you've ever written a similar model or are good at NLP, would attempt this. Looking at a variety of different approaches to the same problem I am trying to solve would be great. So I decided to post here. Would be great to hear from you guys! submitted by /u/GraphicsMonster [link] [comments]  ( 9 min )
    [N] Toyota Research Institute Unveils Breakthrough in Teaching Robots New Behaviors
    Press release Diffusion Policy: Visuomotor Policy Learning via Action Diffusion (contains link to paper) Comments on Hacker News with some interesting info / links. submitted by /u/falconberger [link] [comments]  ( 8 min )
    Looking for this particular dataset on Cervical cancer [R]
    Hi all, I have found this dataset on Kaggle with zero information about it. It is very intriguing and would like to find the original researchers about it. It contains over 270,000 histopathological images of cervical cancer. I cannot contact the person who posted it on Kaggle because I am not a contributor yet. And they are not replying on other platforms. Any help on finding it will be very much appreciated. If not, where else can I get such large cervical or bladder cancer datasets? Link to the dataset mentioned: https://www.kaggle.com/datasets/rzelite/cervical-cancer Thanks in advance. submitted by /u/dumb_persn [link] [comments]  ( 9 min )
    [N] GitHub CEO interview regarding AI and programming
    https://preview.redd.it/u7lq37efllpb1.png?width=1029&format=png&auto=webp&s=824ee138d9ae4b28d1600b969eab2077f47fb2a6 GitHub CEO Thomas Dohmke spoke on stage at TC Disrupt today and made several statements regarding the development of artificial intelligence and programming: Dohmke believes that AI won't replace software developers but will make them more efficient. Despite AI advancements, the demand for software developers will continue to exceed the supply. The growth in software usage is expected to be exponential, with every company becoming a software company. Legacy code maintenance remains crucial, especially in industries like finance with outdated codebases. Generative AI is creating more demand for technical talent as companies seek to adopt innovative AI solutions. There is a shortage of computer science students, leading to increased demand for developers with AI skills. Source: https://techcrunch.com/2023/09/20/github-ceo-despite-ai-gains-demand-for-software-developers-will-still-outweigh-supply/ submitted by /u/gcore-com [link] [comments]  ( 9 min )
    [D] TimeGAN - doubt on generated sequence
    Hello everyone I have a doubt regarding GANs for tabular data, more specifically for time-series generation. Looking at the TimeGAN paper (code) and other implementations such as the one by YData, considering that the training dataset is divided into slices (like a rolling window) of N length and that the generated sequence will follow the same format, that is [batch size, N, n_features], what we are effectively generating are slices and not a fully synthetic time-series. ​ To clarify my question, the output of a TimeGAN is only a set of slices, although there is no guarantee that generated slice(N-1) is the one before slice(N), and so the true use of a TimeGAN is generating slices that can be used to train a predictive TS model but not reconstructing a complete time series like the one used for training. ​ I suspect that's the case (useful only to generate good slices for training predictive models) as there is no fully generated time series in the paper or any of the codes, while the quality of the generated data is partially measured by training a model on original data and generated data and comparing their predictive performance. If anyone can confirm/develop upon these thoughts I would be extremely grateful. submitted by /u/iReallyReadiT [link] [comments]  ( 9 min )
    [P] Quick fine-tuning image classification models from Bing image search
    I've been working on our interactive dataset explorer for machine learning: renumics-spotlight. Recently, I set up an image classification example using it. I utilized Bing image search to create a fully functional example for custom-defined classes. To my surprise, it not only worked efficiently but also took only a few minutes for fine-tuning. The best part is its adaptability. You can easily switch it up for different image classes. Maybe you like it. Let me know what you think or if you have any suggestions to improve! Install with pip install renumics-spotlight sliceguard[all] Imports from renumics import spotlight from sliceguard.data import create_imagedataset_from_bing from sliceguard.models.huggingface import finetune_image_classifier, generate_image_pred_probs_embeddings from sliceguard.embeddings import generate_image_embeddings ​ Create an Image Dataset from Bing class_names = [ "Blue Tang", "Clownfish", "Spotted Eagle Ray", "Longnose Butterfly Fish", "Moorish Idol", "Royal Gramma Fish", ] df = create_imagedataset_from_bing( class_names, 25, "data", test_split=0.2, license="Free to share and use" ) Fine-tune a ViT Model with the data (in 1-2 minutes on a GPU) finetune_image_classifier( df[df["split"] == "train"], model_name="google/vit-base-patch16-224-in21k", output_model_folder="./model_folder", epochs=15, ) Enrich the DataFrame with Predictions, Probabilities and Embeddings and visualize it: df["prediction"], df["probs"], df["embeddings"] = generate_image_pred_probs_embeddings( df["image"].values, model_name="./model_folder" ) # Check the result and detect problematic clusters spotlight.show( df, layout="https://spotlight.renumics.com/resources/image_classification_v1.0.json" ) ​ https://i.redd.it/20qy5xw62lpb1.gif submitted by /u/DocBrownMS [link] [comments]  ( 9 min )
    [D] finetuning quantized model is a bad idea?
    Hi there, due to the lack of my resources, I have to use quantized big-model or something lighter. In this situation, I want to try the first option, and goona finetune some model. How do you expect the result? Training huge model in 4-bits circumstance will be significantly different from original setting? Thanks. submitted by /u/Mundane_Definition_8 [link] [comments]  ( 9 min )
    [D] UNCERTAINTY IN GRADIENT BOOSTING VIA ENSEMBLES
    Paper: https://doi.org/10.48550/arXiv.2006.10562 Hi all, This paper explores the use of using a single model (meaning an ensemble of trees) to generate uncertainty. This technique has been implemented into catboost. My question is why hasn't this been implemented into xgboost? The technique looks easily applicable but I would have expected it to be implemented already as it is 2 years old. Is this for some reason not applicable to Xgboost? ​ Figure 1 from paper showing the 'virtual' ensemble submitted by /u/MetalOrganicKneeJerk [link] [comments]  ( 9 min )
    Context Vectors Embedding [R]
    Previously I was spoiled my LangChain. Suppose I have a bunch of PDFs that I want to store as context vectors. And I want to use an open-source LLMs. Without using LangChain, how do I generate the context vectors? (I will store it in vector databases) submitted by /u/stephenhky [link] [comments]  ( 9 min )
    [N] OpenAI Announced DALL-E 3: Art Generator Powered by ChatGPT
    For those who missed it: DALL-E 3 was announced today by OpenAI, and here are some interesting things: No need to be a prompt engineering grand master - DALL-E 3 enables you to use the ChatGPT conversational interface to improve the images you generate. This means that if you didn't like what it produced, you can simply talk with ChatGPT and ask for the changes you'd like to make. This removes the complexity associated with prompt engineering, which requires you to iterate over the prompt. Majure improvement in the quality of products compared to DALL-E 2. This is a very vague statement provided by OpenAI, which is also hard to measure, but personally, they haven't failed me so far, so I'm really excited to see the results. DALL-E 2 Vs. DALL-E 3, image by OpenAI From October, DALL-E 3 will be available through ChatGPT and API for those with the Plus or Enterprise version. And there are many more news! 🤗 I've gathered all the information in this blog 👉 https://dagshub.com/blog/dall-e-3/ submitted by /u/RepresentativeCod613 [link] [comments]  ( 9 min )
  • Open

    Distilling step-by-step: Outperforming larger language models with less training data and smaller model sizes
    Posted by Cheng-Yu Hsieh, Student Researcher, and Chen-Yu Lee, Research Scientist, Cloud AI Team Large language models (LLMs) have enabled a new data-efficient learning paradigm wherein they can be used to solve unseen new tasks via zero-shot or few-shot prompting. However, LLMs are challenging to deploy for real-world applications due to their sheer size. For instance, serving a single 175 billion LLM requires at least 350GB of GPU memory using specialized infrastructure, not to mention that today's state-of-the-art LLMs are composed of over 500 billion parameters. Such computational requirements are inaccessible for many research teams, especially for applications that require low latency performance. To circumvent these deployment challenges, practitioners often choose to deplo…  ( 93 min )
  • Open

    Ddqn tunning
    Hi world I'm trying to implement dqn and ddqn on various playgrounds with Matlab, from scratch. I use dqn and ddqn with replay buffer. I tried to used gradient clipping. But I find it very difficult to tune. I mean I tuned and programmed successfully various algorithm such as different GAN that are not necessarily very easy to tune. RL algorithm seems even worse. I tried several grid world problems, cart pole and even a very simple second order system regulation (integrator + 1 st order pôle). Even the Matlab cart pole demo with ddqn does not provide a very satisfactory result (after training, the contrôler keep the pendulum in uprigth position, but the cart slowly drift). On the second order system, the learning seems ok for about 200 épisode(test shows that the expected behavior is almost learnt) and then suddenly the average episode score has a kind of inflection and everything blowns up, despite a very large replay buffer. So my question : is it possible to make dqn and ddqn work well with a reasonable tuning session length? Are policy optimization easier to tune (and/or more efficient)? So far, it seems to me that these action-vzlue based algorithms are highly unstable and the training may only works for a 'miraculous' tunning. What are you thought? submitted by /u/seb59 [link] [comments]  ( 9 min )
    Implementation of Reinforcement Learning to achieve an autonomous drone
    Hello there, I'll provide a quick introduction first. I am a mechatronics engineer student who is graduating this semester, I have been also privately studying ML, DL, CV for the past year because that's when I decided that I want to make an autonomous drone as my thesis/graduation project and oh my god, little did I know. I have aquired so much knowledge in the data science field and truth be told I love it, it engages my mind like crazy so I've decided that i would like my career to be revolved around electronics and programming/ data science. Back to the topic, I am working on object detection (I got it figured out), and reinforcement learning, so what I would like to do is to train the model on reaching it's destination using collision avoidance, I have a PIX4, RPI 4B 4GBs since I'm also building my drone from scratch, so let's say I have all the hardware which ofcourse includes the GPS + Compass and 5 ultrasonic sensors, 1 at each side and 1 down to hold the altitude accurately. I was thinking of HITL, making a virtual environment to train my model there because this is what logic says, ofcourse( I could also fly it using the RC controller and for example test if the obstacle avoidance is going to overrule the manual drive, which is also my objective). The question is: how can I make sure virtual drone is going to be equipped with the real-life sensors and in the needed positions? How can I feed all the sensors data during the training? Which type of reinforcement learning models should I implement? I just feel like I dragged myself into an incredibly amazing and complex project which is going to end up fucking my semester and I won't be able to graduate, it's just that my stress now is skyrocketing. Please feel free to throw any advice or opinions my way, and thank you for keeping up with this long post. 😁 submitted by /u/Gabii99 [link] [comments]  ( 10 min )
    Cost function for a deep q network
    I am using Game Maker and I just wanted to check if this is how to do the cost function. ​ loss_function = 0 derivative_gradient = 0 array_copy(global.main_inputs,0,global.inputs,0,array_length(global.inputs)) for (var i = 0; i < array_length(buffer_sampling); i++) { var _reward = buffer_sampling[i][2] global.inputs = buffer_sampling[i][3] var _max_q = forward_prop_t()*global.gamma var yi = _reward + _max_q var cur_q_value = buffer_sampling[i][1][0] loss_function += power((yi - cur_q_value),2) derivative_gradient += 2*(cur_q_value - yi) } array_copy(global.inputs,0,global.main_inputs,0,array_length(global.main_inputs)) derivative_gradient = derivative_gradient*(1/array_length(buffer_sampling)) global.cost = loss_function*(1/array_length(buffer_sampling)) ​ basically, buffer_sampling is an array with tuples of [first_state, [q_value,action], reward, next_state] Thanks for any help! submitted by /u/Daninjacat256 [link] [comments]  ( 9 min )
    Building intuition on AC algo with TD
    I am trying to build intuition on why the actor update equation actually help the actor improving its policy. From what I understand, the update is proportional to the Critic's TD error. A better than expected reward will lead the actor to increase the prob of taking the associated action, and a disappointing reward will make the actor to move away from the given action. Here's where I feel like I'm missing a piece of the puzzle. Between a good state/action pair accurately valued by the Critic and a bad state/action pair undervalued by the Critic, the actor's update will favor the bad action more. Could we not conceive a scenario in which the agent gets stuck in a suboptimal policy because of that particular behavior? What triggers this questioning is that I have implemented a standard AC algo for Pixelcopter and found my agent getting stuck easily on a suboptimal policy (i.e. constantly getting higher, or lower until crash, while the critics assign good values on center squares and bad values to squares close to the wall's edge). For TD0, this is pretty marginal, but gets more significant for n-step TD as n increases. Any thoughts? submitted by /u/infundibuliforme [link] [comments]  ( 9 min )
    [R] The League of Robot Runners: Coordinate thousands of robots in real time!
    Hello machine and reinforcement learners! This is an announcement and call for participation in the League of Robot Runners, a new 🚀 competition and research initiative 🚀 that tackles one of the most challenging problems in industrial optimisation: Multi-Robot Path Planning (sometimes also called Multi-Agent Path Finding). Recently launched at ICAPS 2023, the competition is inspired by a variety of new and newly emerging applications that rely on mobile robotics 🦾🤖. For example, Amazon automated warehouses, where up to thousands of robots work together to ensure safe and efficient package delivery 🧸📦 🚚 ❤️. Participants in the competition are asked to compute coordinated and collision-free movement plans ⤴️ ➡️ ⤵️ 🔄 for a team of robotic errand runners. Get the robots to their d…  ( 10 min )
  • Open

    How United Airlines built a cost-efficient Optical Character Recognition active learning pipeline
    In this post, we discuss how United Airlines, in collaboration with the Amazon Machine Learning Solutions Lab, build an active learning framework on AWS to automate the processing of passenger documents. “In order to deliver the best flying experience for our passengers and make our internal business process as efficient as possible, we have developed […]  ( 10 min )
    Optimize generative AI workloads for environmental sustainability
    To add to our guidance for optimizing deep learning workloads for sustainability on AWS, this post provides recommendations that are specific to generative AI workloads. In particular, we provide practical best practices for different customization scenarios, including training models from scratch, fine-tuning with additional data using full or parameter-efficient techniques, Retrieval Augmented Generation (RAG), and prompt engineering.  ( 10 min )
  • Open

    NVIDIA Studio Lineup Adds RTX-Powered Microsoft Surface Laptop Studio 2
    The NVIDIA Studio laptop lineup is expanding with the new Microsoft Surface Laptop Studio 2, powered by GeForce RTX 4060, GeForce RTX 4050 or NVIDIA RTX 2000 Ada Generation Laptop GPUs, providing powerful performance and versatility for creators.  ( 8 min )
    Run AI on Your PC? GeForce Users Are Ahead of the Curve
    Gone are the days when AI was the domain of sprawling data centers or elite researchers. For GeForce RTX users, AI is now running on your PC. It’s personal, enhancing every keystroke, every frame and every moment. Gamers are already enjoying the benefits of AI in over 300 RTX games. Meanwhile, content creators have access Read article >  ( 8 min )
    Into the Omniverse: Blender 4.0 Alpha Release Sets Stage for New Era of OpenUSD Artistry
    For seasoned 3D artists and budding digital creation enthusiasts alike, an alpha version of the popular 3D software Blender is elevating creative journeys.  ( 7 min )
    NVIDIA CEO Jensen Huang to Headline AI Summit in Tel Aviv
    NVIDIA founder and CEO Jensen Huang will highlight the newest in generative AI and cloud computing at the NVIDIA AI Summit in Tel Aviv from Oct. 15-16. The two-day summit is set to attract more than 2,500 developers, researchers and decision-makers from across one of the world’s most vibrant technology hubs. With over 6,000 startups, Read article >  ( 5 min )
    Cash In: ‘PAYDAY 3’ Streams on GeForce NOW
    Time to get the gang back together — PAYDAY 3 streams on GeForce NOW this week. It’s one of 11 titles joining the cloud this week, including Party Animals. The Perfect Heist PAYDAY 3 is the highly anticipated sequel to one of the world’s most popular co-op shooters. Step out of retirement and back into Read article >  ( 5 min )

  • Open

    [P] How to build CI/CD pipelines with AWS SageMaker for continuous ML training
    TL;DR How to build CI/CD pipelines with AWS SageMaker for ML training: https://dagshub.com/blog/ci-cd-for-continuous-training-with-sagemaker/ One of the most time-resources-energy-consuming tasks we face when we build internal projects is setting up an instance for ML training. It's a repetitive process as we run multiple experiments over the project life cycle, with many steps and configurations that are usually documented poorly and scattered around different places. As good engineers, we decided to automate the process using CI/CD pipelines. But how? We had ZERO ideas on how to do it, so we had to go through the rigorous process of using AWS guides and tutorials to figure it out. Usually, when this happens, we extend the project lifecycle and have a team member document the process so we can refer back to it when we need to do it again. Knowing this can benefit the community, we decided to share a series of blogs that guide you through building CI/CD pipelines for continuous training with AWS SageMaker. We published the first blog, which covers the configuration part a month ago, and we are happy to share the second one which explains how to build a continuous training pipeline for ML. Configure AWS SageMaker for CI/CD: https://dagshub.com/blog/setup-sagemaker-for-ci-cd-pipelines/ How to build CI/CD pipeline with AWS SageMaker for ML training: https://dagshub.com/blog/ci-cd-for-continuous-training-with-sagemaker/ I'm sure we can improve these tutorials, and would love to learn from your experience on how we can do it! 🤗 submitted by /u/RepresentativeCod613 [link] [comments]  ( 9 min )
    [D]How to "productionize" a jupyter notebook in a technical interview?
    Hi redditors! I've been tasked with an technical homework that asks me to "productionize" a Jupyter notebook. The notebook's workflow is as follows: Reads data from a CSV. Pre-processes the data. Trains a machine learning model. Makes predictions. My initial thoughts are: Separation of Concerns: Break the notebook down into distinct components - data processing, model training, and inference. Containerization: Write a Dockerfile to ensure the environment is reproducible and isolated. API for Prediction: Set up a Flask-based service to expose the model's prediction capability. However, I'm grappling with a few challenges: I'm not seeing a stark difference between the Jupyter notebook setup and the production setting. Becuase the model is small and training is not complicated. If I were to deploy to Kubernetes pods, it seems wasteful that many pods are doing the same thing( prediction for the same model, maybe just as high availability guarantee?) . Does the training phrase or data cleansing phase need to be containerized and deployed? Because this seems like a one-off process. How to deploy a scalable service? I am new to this. Each container may have a HTTP service. How to load balance them on Kubernetes from outside? What kind of CI/CD do you recommend for this task? What kind of testing or pipeline are needed? submitted by /u/zjplab [link] [comments]  ( 9 min )
    [P][N] Announcing Zivy Scholar – An AI tool to help researchers consume papers.
    Hi r/MachineLearning After a long beta, we are excited to announce Zivy Scholar to the public! Zivy Scholar is a tool that allows you to help researchers consume research papers. I'm one of the creators and I've found that I want to listen to research papers in the car or when I'm working out. Current features include: Listen to and read along with a research paper Share the paper with colleagues and friends. We use state of the art pdf data extraction techniques with TTS to provide this functionality. Features we're planning: Full PDF to HTML including images, tables, and figures inline for the read-along experience. This includes mobile optimization. This functionality is based on some newer research and we're excited to bring it to you all! Discussion and feedback are welcome! Cheers, Collin submitted by /u/collin_code_77 [link] [comments]  ( 9 min )
    [D] Which is the best Model for generation of code?
    The title basically conveys the entire message. Which according to you is the best AI model for generation of code. I mainly code in python with AI and deep learning as a core focus and recently started doing a bit of c++ and thus far have used only bard and ChatGPT 3.5. With this experience of my I can confidently say that GPT outperforms Bard by a huge margin There are usually some minor modifications that need to be done and that is part of a devs life isn't it? recently a lot of new models are picking up steam hence was wondering if there exits a model which is better than GPT what are your views? submitted by /u/rakk109 [link] [comments]  ( 9 min )
    [P] looking for cs students to collaborate with on the E-Bay ML challenge
    Looking for CS students to team up with for the E-BAY ML challenge. eBay hosts a ML challenge where the winning team gets an internship We already have a team in place and have mad some significant progress, now looking for team members to take us even further. https://eval.ai/web/challenges/challenge-page/2014/overview submitted by /u/thelongshortseller [link] [comments]  ( 9 min )
    [D] Do NeurIPs workshop papers get published?
    I’m submitting to the workshop and was wondering if the papers there get published? submitted by /u/Odd-Distance-4439 [link] [comments]  ( 8 min )
    Silent Data Corruption affects Large Model Training [News]
    Sharing an investigation we did at Adept into some issues that were causing problems with training LLMs at scale. Sometimes the hardware makes silent errors! How we found them and tracked down the problematic machines. https://www.adept.ai/blog/sherlock-sdc submitted by /u/ekelsen [link] [comments]  ( 9 min )
    [D] Which specs would be better for a beginner: a 12th gen i7 + RTX 3060 or 13th gen i7 + RTX 3050 (laptop)
    I'm looking to buy an upgrade, and at my price point (~ 850 GBP) I can get a new 13th i7 + rtx 3050, or a used gtx3060 + 12th gen i7, both 16GB laptop. Unfortunately I really need the mobility of a laptop in my life at the moment and I realise this isn't ideal. I'll be looking and experimenting with basic networks, no language models, maybe a few GANs for fun, some DL networks for the atari gym and other experiments. All this for learning, with my own custom networks for various data. I'll be running linux - possibly Debian. I'd love to hear your comments or suggestions if there's something better at my price point in a laptop. Thanks submitted by /u/Mean_Actuator3911 [link] [comments]  ( 9 min )
    [R] Robust e-NeRF: NeRF from Sparse & Noisy Events under Non-Uniform Motion
    Project Page Paper Code Abstract: Event cameras offer many advantages over standard cameras due to their distinctive principle of operation: low power, low latency, high temporal resolution and high dynamic range. Nonetheless, the success of many downstream visual applications also hinges on an efficient and effective scene representation, where Neural Radiance Field (NeRF) is seen as the leading candidate. Such promise and potential of event cameras and NeRF inspired recent works to investigate on the reconstruction of NeRF from moving event cameras. However, these works are mainly limited in terms of the dependence on dense and low-noise event streams, as well as generalization to arbitrary contrast threshold values and camera speed profiles. In this work, we propose Robust e-NeRF, a novel method to directly and robustly reconstruct NeRFs from moving event cameras under various real-world conditions, especially from sparse and noisy events generated under non-uniform motion. It consists of two key components: a realistic event generation model that accounts for various intrinsic parameters (e.g. time-independent, asymmetric threshold and refractory period) and non-idealities (e.g. pixel-to-pixel threshold variation), as well as a complementary pair of normalized reconstruction losses that can effectively generalize to arbitrary speed profiles and intrinsic parameter values without such prior knowledge. Experiments on real and novel realistically simulated sequences verify our effectiveness. Our code, synthetic dataset and improved event simulator are public. submitted by /u/Sirisian [link] [comments]  ( 9 min )
    [N] Savelikeapro.ai: AI powered, WhatsApp driven bookmarking for productivity.
    👋 Try, savelikeapro.ai, It’ s, A.I powered. Zero-installation fits in your daily workflow. FREE-forever option submitted by /u/prithivida [link] [comments]  ( 8 min )
    [R] Program generation is all you need? For math, symbolic reasoning, natural language, etc.
    The paper introduces improved performance by prompting LLMs with "natural language embedded programs (NLEP)". No task-specific prompt is needed. Paper: https://arxiv.org/abs/2309.10814 An automatic NLEP generation toolkit is opensourced: https://github.com/luohongyin/langcode Example Colab notebook is included in the Github repo. This work introduces the following features of NLEP: NLEP is a full python program that prints the target response of LLMs. Task-general NLEP prompting outperforms task-specific chain-of-thought prompting on math, symbolic, and natural language. Enable the chain-of-thought reasoning ability of small models (RoBERTa) on text classification Hierarchical instructing via program completion. submitted by /u/SUKHOIHY [link] [comments]  ( 9 min )
    [P] Implications of Unequal Fold Sizes in Cross-Validation
    I’m working on a project where I initially split my dataset into k equally sized folds for cross-validation. However, I want to perform some additional sampling operations on the training set within each fold, which would result in varying fold sizes. Is this practice acceptable, or does it violate the premises of traditional cross-validation? If so, are there any papers or resources that explore the implications of varying fold sizes in cross-validation? Thank you for your insights! submitted by /u/Leading_Complex7425 [link] [comments]  ( 9 min )
    [P] Running LLM on desktop/mobile (Hybrid distant/local)
    Hey, I was checking out tauri last week. I was so blown away that I wrote a bit more recently and wrote hf.co/chat desktop/mobile API + local LLM. https://github.com/Narsil/hf-chat Just thought I should share in case others are interested, and wanted to make a bit shoutout to tauri team, the docs, and overall UX is top notch, basically everything pretty much worked out of the box without any hiccup. submitted by /u/narsilouu [link] [comments]  ( 9 min )
    [D] NeurIPS 2023 paper acceptance results
    NeurIPS 2023 paper acceptance results are supposed to be released at 8 pm (CDT) on September 21. I thought to create a discussion thread for us to countdown and discuss any celebration/issue/complaint/feedback or anything else. There is so much noise in the reviews every year. Some good work that the authors are proud of might get rejected because of the noisy system, given that NeurIPS is growing so large these years. We should keep in mind that the work is still valuable no matter what the final result is. submitted by /u/Apprentice12358 [link] [comments]  ( 9 min )
    [D] Attention mechanism issue
    Hello, I'm trying to train a multi-modal transformer for Activity Recognition. I employed a two-stream architecture, where one is a Transformer that takes in a sequence of skeleton trajectory, and the latter is a pre-trained Vision Transformer. When I train the model, and investigate the attention weights for the Transformer which takes in skeleton trajectories, I noticed that all of them are approximately the same (~0.029). I'm not really sure what is going on here, I've tried changing the learning rate, tried different optimizers(SGD, Adam). But still I'm not really sure what is causing this. Please suggest me some debugging steps, or what should I look at anything in particular that causes this submitted by /u/Terrible-Ad6239 [link] [comments]  ( 9 min )
    [D] Zoomposium with Professor Dr. Petra Ritter: "The simulation of brains"
    Zoomposium with Professor Dr. Petra Ritter: "The simulation of brains" In another installment in our "Zoomposium Series" on the topic of "Brain Research", my colleague Axel Stöcker of the "Blog der großen Fragen" and I had the great honor and pleasure of conducting an interview with the very well-known and renowned German medical doctor and neuroscientist Professor Dr. Petra Ritter. In this context, Ms. Ritter became a co-founder and leader of the co-design project "The #Virtual #Brain", which is a component of the European Open Science Cloud (EOSC) and is "a neuroinformatics platform for simulating whole brain networks using biologically realistic connectivity". She is leading the development of a virtual research environment as a collaborative research platform for sensitive health data and head of the "German National Neuroscience Research Infrastructure Initiative (NFDI-Neuroscince)" and involved in the development of the "Health Data Cloud #EBRAINS". Petra Ritter has been Johanna Quandt Professor and Head of the Section for Brain Simulation at the Department of Neurology with Experimental Neurology at Charité - Universitätsmedizin Berlin since 2017. There, Professor Ritter and her team are involved in the "Simulation of Brains". More at: https://philosophies.de/index.php/2023/09/17/die-simulation-von-gehirnen/ ​ https://preview.redd.it/3cpni6o6cepb1.jpg?width=1000&format=pjpg&auto=webp&s=998c30d16ddae30511b7983abce7802dfdd54945 submitted by /u/philosophiesde [link] [comments]  ( 9 min )
    [R] From Sparse to Dense : GPT-4 Summarization with Chain of Density Prompting
    The following example implements the technique from the paper "From Sparse to Dense: GPT-4 Summarization with Chain of Density Prompting", Adams et al. (2023). Edit : the library used is py-llm-core ```python from typing import List from dataclasses import dataclass from llm_core.assistants import OpenAIAssistant @dataclass class DenseSummary: denser_summary: str missing_entities: List[str] @dataclass class DenserSummaryCollection: system_prompt = """ You are an expert in writing rich and dense summaries in broad domains. """ prompt = """ Article: {article} ---- You will generate increasingly concise, entity-dense summaries of the above Article. Repeat the following 2 steps 5 times. - Step 1: Identify 1-3 informative Entities from the Article which are missing from the previously gene…  ( 10 min )
    [P] Looking for projects and collaboration in the field of neuroscience-inspired intelligent agents
    Hi, I have been looking through a few open source projects for LLM, but without much success in finding some integrating approaches from neuroscience of human intelligence. That why I am here to ask for projects and collaborations either academical, non-for-profit / open source or commercial. It would be great if your can give me some directions for this. Thanks submitted by /u/confluence_84 [link] [comments]  ( 9 min )
    [R] "Contrastive Decoding Improves Reasoning in Large Language Models", O'Brien & Lewis 2023 (boosts LLaMA-8B to >GPT-3.5/PaLM-540B on GSM8K)
    submitted by /u/InterviewIntrepid889 [link] [comments]  ( 9 min )
    [D] Are there any good math Datasets for Training small models?
    I've seen Allen AI's Lila Dataset, and I want to use this for a small model, to turn math to code. However, I dont think a small dataset in 300k rows is enough. Does anyone know of any bigger, similar datasets? submitted by /u/vatsadev [link] [comments]  ( 9 min )
  • Open

    Is PPO a good algorithm in terms of exploration?
    I recently trained my PPO algorithm on one of my own custom environment. Although the episodic reward increased steadily in the beginning, After some time it just became constant with some occasional positive and negative spikes. I was wondering if it has something to do with the exploration problem with ppo. Any tips on how can I improve it? submitted by /u/Interesting-Weeb-699 [link] [comments]  ( 9 min )
    reinforcement learning and rust
    I'm a somewhat experienced dev, but never did anything related to ML or AI and want to start toying around with ML or reinforcement learning to be specific. Since my language of choice for almost everything is Rust I wanted to ask you guys if you have any advice on how to get started like crates, frameworks etc or if rust just isnt a good fit for ML. Thanks in advance for any help! submitted by /u/linus-eing [link] [comments]  ( 9 min )
    Suggestions of Game-Focused Customizable Environment
    Hello everyone! My research group is looking for new environments that are customizable and 3D such as Unity (and ML Agents). Although the latest changes in Unity TOS shouldn't affect us researchers, we wanted to keep an eye out for potential replacements for it. We are familiar with environments like ViZDoom and DeepMind Lab, but we're looking for more examples. The main requirements are: Customizable environment 3D Capable Free (preferably open-source, but being free is enough) Thanks for the help :D submitted by /u/romulofff [link] [comments]  ( 9 min )
    RL for Tuning
    Hey guys, I am new to Reinforcement Learning and I am trying to understand how to go about a project I’m working on. I have a dataset with 2 features (X and Y). I am trying to tune a second order Transfer Function to fit X to Y( has 3 parameters) and gives Y’ as output. So I currently consider the error between Y and Y’ as the “State” and try to minimise the error by modelling a reward function that checks for instability and unreasonable values. I am trying to do this using TD3 but the model does not learn. I am wondering if there are any approaches that I should consider. I chose TD3 since my action space is continuous. I am sorry if I don’t understand something basic since I’m a noob to this. Thanks for your help in advance. :) Some more details: the dataset values don’t change. I give the same values for every time step. So I am not understanding how episodes and time steps work in this context. submitted by /u/ninjaaa30 [link] [comments]  ( 9 min )
    What does policy collapse mean?
    I am reading the following article on SpinningUp but can't get my head around policy collapse - "This is different from normal policy gradient, which keeps new and old policies close in parameter space. But even seemingly small differences in parameter space can have very large differences in performance—so a single bad step can collapse the policy performance. This makes it dangerous to use large step sizes with vanilla policy gradients, thus hurting its sample efficiency. TRPO nicely avoids this kind of collapse, and tends to quickly and monotonically improve performance." Why would updating the parameters lead to a policy collapse? The parameters are updated based on the performance of the RL system and therefore, I don't see the point of constraints. submitted by /u/Academic-Rent7800 [link] [comments]  ( 9 min )
    PPO Action masking in SB3
    I'm using ppo action masking in sb3, which works well in training and masking the illegal / invalid actions according to my criteria. However, I have 2 problems during testing. ​ I use the same code for masking the actions in testing too since the model was unable to learn by itself during training action masking criteria and when enforcing it in code, it still doesn't mask anything. ​ The model converges to just one action in testing phase after 500k steps, the episode reward still increasing the training phase though. I don't know why? maybe overfitting! Code for training with action masking: def mask_fn(env: gym.Env) -> np.ndarray: return env.valid_action_mask() env = StockEnv(train) # Wrap the environment with ActionMasker and the mask_fn function env = ActionMasker(env, mask_fn) model = MaskablePPO(MaskableActorCriticPolicy, env, tensorboard_log="./tensorboard" ,n_steps=2048 ) for i in range (1,52): model.learn(total_timesteps=TIMESTEPS , tb_log_name = 'PPO2' , reset_num_timesteps=False) Code for testing with action masking: def mask_fn(env: gym.Env) -> np.ndarray: return env.valid_action_mask() env = ActionMasker(env, mask_fn) model_path = f"{models_dir}/700000.zip" model = MaskablePPO.load(model_path, env=env) episodes = 1 for ep in range(episodes): obs = env.reset() done = False while not done: action, _states = model.predict(obs) obs, rewards, done, info = env.step(action) env.render() submitted by /u/Acceptable_Egg6552 [link] [comments]  ( 9 min )
    Knew to RL.Some question about the reward setting
    The env is like mutil routing,for example i have 10 nodes in a map(2-d array), and i need to route them to the edge of the map,but the point where the node in the edge need to obey some rules,like the order need to be clockwise like below https://preview.redd.it/t9jlxvix8dpb1.png?width=545&format=png&auto=webp&s=16719a2dbd11333c454bd2ee71b87895a6414371 so now my basic setting is,i use the action mask to invalid some forbidden action. When a node is routing to the edge of the map,i give +10 reward,if it's not obey the clockwise rule,i give -200 reward, if the whole node is routed success,i give +200 reward,and each steps does'n make any node routed, i give -1 reward. I am new to this area,i don't know the way i set reward if is good,may i have some advide? submitted by /u/Street_Helicopter_31 [link] [comments]  ( 9 min )
  • Open

    School of Engineering welcomes Songyee Yoon PhD ’00 as visiting innovation scholar
    A visionary entrepreneur and innovator, Yoon will focus on entrepreneurship, supporting female engineers, and fostering inclusive innovation.  ( 8 min )
  • Open

    Intel's 'AI PC'
    Intel has announced a new chip, called 'Meteor Lake', that will allow laptops to run generative artificial intelligence chatbots without relying on cloud data centers. This will enable businesses and consumers to test AI technologies without sending sensitive data off their own computers. Intel demonstrated the capabilities of the chip at a software developer conference, showcasing laptops that could generate songs and answer questions in a conversational style while disconnected from the internet. The company sees this as a significant moment in tech innovation. Intel is also on track to release a successor chip called 'Arrow Lake' next year Source : https://www.reuters.com/technology/intel-says-newest-laptop-chips-software-will-handle-generative-ai-2023-09-19/ submitted by /u/NuseAI [link] [comments]  ( 9 min )
    Exploring GPT-3.5-turbo vs. GPT-4: Which Model Is Better?
    submitted by /u/Senior_tasteey [link] [comments]  ( 8 min )
    Is this Vaporwave or Cyberpunk?
    What does this remind you of? submitted by /u/metairwaves [link] [comments]  ( 8 min )
    AI generated childhood 'memories' (ModelScope) with post processing (old video tape style) and my own music (retro analogue synths).
    submitted by /u/glenniszen [link] [comments]  ( 8 min )
    DeepMind founder says AI existential risk 'completely bonkers distraction'
    Mustafa Suleyman, co-founder of DeepMind, believes that concerns about the existential risks of AI are a distraction from more practical issues such as privacy and bias. He is confident that governments can effectively regulate AI using frameworks that have been successful in the past, citing the regulation of aviation and the internet as examples. Suleyman emphasizes the importance of setting boundaries and limits for AI to ensure human oversight and enforceable laws. He calls for a combination of broad, international regulation and smaller, more granular policies at the micro level. Suleyman suggests limiting AI's ability to improve itself as a critical first step in ensuring human oversight. He also highlights the need for governments to have direct access to AI developers to enforce boundaries and establish clear regulations. Governments worldwide, including the European Union and China, are already working on AI regulations. Source : https://fortune.com/2023/09/19/ai-existential-risk-threat-bonkers-distraction-regulation-deepmind-mustafa-suleyman/ submitted by /u/NuseAI [link] [comments]  ( 9 min )
    Suggestions for how to use AI for a commercial office fit-out business?
    A friend of mine has a small 12 person company that does office refurbishment and commercial redesign projects - he has asked me to speak to his team to give them an AI 101 - and to tell them a little about how they should be looking into using AI tools for their company. I know there are plenty of tools and apps that take photos of a bare room and make them look like a design magazine - any in particular I should show them? Can anyone think of other tools that they should look into using, or how do people in this space currently use AI? ​ submitted by /u/zascar [link] [comments]  ( 9 min )
    One-Minute Daily AI News 9/19/2023
    Software company Digimarc will now let copyright owners add more information to their work, which the company said will improve how AI models treat copyright in training data.[1] AlphaMissense, a new model from Google’s artificial intelligence team, analyzes the effects of DNA mutations and will accelerate research into rare diseases.[2] Google’s AI assistant can now read your emails, plan trips, “double-check” answers.[3] Teens using AI to generate nude deep fakes to bully, harass classmates, FBI expert warns.[4] Sources: [1] https://www.theverge.com/2023/9/19/23879555/digimarc-copyright-watermark-generative-ai [2] https://www.wired.co.uk/article/deepmind-ai-alphamissense-genetics-rare-diseases [3] https://arstechnica.com/information-technology/2023/09/googles-ai-assistant-can-now-read-your-emails-plan-trips-double-check-answers/ [4] https://www.news5cleveland.com/news/local-news/teens-using-ai-to-generate-nude-deep-fakes-to-bully-harass-classmates-fbi-expert-warns submitted by /u/Excellent-Target-847 [link] [comments]  ( 9 min )
    Bard Gets a Major Upgrade
    Google's Bard chatbot is extending its abilities with access to personalized Google apps and services including Gmail, Docs, Drive, Maps, YouTube, and Google Flights and hotels. To stay on top of the latest advancements in AI, look here first. https://preview.redd.it/s1ivsummzbpb1.png?width=1600&format=png&auto=webp&s=d8fc3262ba542b950de3c8a85b9a987763270ada Bard Extensions and Google Applications Integration Bard can now tap into individualized data from Google apps such as Gmail, Drive, and Docs, with user permission. Google reassured users that personal data accessed by Bard will not be used for reinforcement learning, providing another on their transparency, choice, and control tenets. Prompts can direct Bard to search for specific information within Gmail, but it does not store the entire inbox content. Collaborative Characteristics and Fact-Checking Capabilities Users can employ Bard to summarize emails or gather trip details from email threads, and then research real-time travel information, surface YouTube recommendations for the destination, and provide Maps directions to the airport. Google's chatbot can also double-check its responses against Google search, improving user trust and enhancing Bard's model through user feedback on incorrect answers. The new extensions using non-personal data – YouTube, Flights, Hotels, and Maps – are automatically opted-in but users can choose to opt-out. Collaboration and Language Availability Bard now lets users share an ongoing chat with others through a public link. Google plans to expand Bard's feature set to over 40 new languages beyond its existing English language capabilities. (source) P.S. If you like this kind of analysis, I put out a free newsletter covering the latest and most pertinent news and research in AI. Regular readers include professionals from Google, Meta, and OpenAI. submitted by /u/AIsupercharged [link] [comments]  ( 9 min )
    Starting to get the impression I'm legit going to be replaced
    submitted by /u/guh-eye [link] [comments]  ( 8 min )
  • Open

    Virtually Incredible: Mercedes-Benz Prepares Its Digital Production System for Next-Gen Platform With NVIDIA Omniverse, MB.OS and Generative AI
    Mercedes-Benz is using digital twins for production with help from NVIDIA Omniverse, a platform for developing Universal Scene Description (OpenUSD) applications to design, collaborate, plan and operate manufacturing and assembly facilities. Mercedes-Benz’s new production techniques will bring its next-generation vehicle portfolio into its manufacturing facilities operating in Rastatt, Germany; Kecskemét, Hungary; and Beijing, China — Read article >  ( 6 min )
  • Open

    Train and deploy ML models in a multicloud environment using Amazon SageMaker
    In this post, we demonstrate one of the many options that you have to take advantage of AWS’s broadest and deepest set of AI/ML capabilities in a multicloud environment. We show how you can build and train an ML model in AWS and deploy the model in another platform. We train the model using Amazon SageMaker, store the model artifacts in Amazon Simple Storage Service (Amazon S3), and deploy and run the model in Azure.  ( 13 min )
  • Open

    Neural Graphical Models
    This research paper was presented at the 17th European Conference on Symbolic and Quantitative Approaches to Reasoning with Uncertainty (opens in new tab), a premier forum for advances in the theory and practice of reasoning under uncertainty. In the field of reasoning under uncertainty, probabilistic graphical models (PGMs) stand out as a powerful tool for […] The post Neural Graphical Models appeared first on Microsoft Research.  ( 10 min )
  • Open

    Webinar | How To Choose An AI Vendor For Your Business
    ​ https://preview.redd.it/ta61cz8qlfpb1.jpg?width=1280&format=pjpg&auto=webp&s=ac45085668c9003e5557a7f0c81ae72db6098653 I hope this webinar finds you well. I suppose that this topic can be interesting for business and AI engineers. Speakers: OpenCV CEO Dr. Satya Mallick and Phil Nelson are joined by Anna Kogan CEO at OpenCV.ai Topics of webinar are: How to search for vendors that understand your business needs (not all AI is the same.) What questions to ask when picking a vendor (not everybody really is an AI expert they claim) Three signs to watch for during the project (progress indicators and red-flags) How best to scope technical tasks (off-the-shelf vs. custom algorithm development) How to set up for long-term success (deployment, documentation, training pipeline) Date: Thursday, September 21st, 2023 at 9am Pacific time. Link for the registration submitted by /u/No-Independence5880 [link] [comments]  ( 9 min )
    Dirty Secrets of BookCorpus, a Key Dataset in Machine Learning
    submitted by /u/nickb [link] [comments]  ( 8 min )
    Neurons in Large Language Models: Dead, N-gram, Positional
    submitted by /u/nickb [link] [comments]  ( 8 min )
    Q-Transformer: Scalable Offline Reinforcement Learning via Autoregressive Q-Functions
    submitted by /u/nickb [link] [comments]  ( 8 min )
    Zoomposium with Professor Dr. Petra Ritter: "The simulation of brains"
    Zoomposium with Professor Dr. Petra Ritter: "The simulation of brains" In another installment in our "Zoomposium Series" on the topic of "Brain Research", my colleague Axel Stöcker of the "Blog der großen Fragen" and I had the great honor and pleasure of conducting an interview with the very well-known and renowned German medical doctor and neuroscientist Professor Dr. Petra Ritter. In this context, Ms. Ritter became a co-founder and leader of the co-design project "The #Virtual #Brain", which is a component of the European Open Science Cloud (EOSC) and is "a neuroinformatics platform for simulating whole brain networks using biologically realistic connectivity". She is leading the development of a virtual research environment as a collaborative research platform for sensitive health data and head of the "German National Neuroscience Research Infrastructure Initiative (NFDI-Neuroscince)" and involved in the development of the "Health Data Cloud #EBRAINS". Petra Ritter has been Johanna Quandt Professor and Head of the Section for Brain Simulation at the Department of Neurology with Experimental Neurology at Charité - Universitätsmedizin Berlin since 2017. There, Professor Ritter and her team are involved in the "Simulation of Brains". More at: https://philosophies.de/index.php/2023/09/17/die-simulation-von-gehirnen/ ​ https://preview.redd.it/xiurryebcepb1.jpg?width=1000&format=pjpg&auto=webp&s=a7a8b6ba563cfc8f0d052bc6f3da27e2a5703a0a submitted by /u/philosophiesde [link] [comments]  ( 9 min )

  • Open

    Oracle Cloud Infrastructure Offers New NVIDIA GPU-Accelerated Compute Instances
    With generative AI and large language models (LLMs) driving groundbreaking innovations, the computational demands for training and inference are skyrocketing. These modern-day generative AI applications demand full-stack accelerated compute, starting with state-of-the-art infrastructure that can handle massive workloads with speed and accuracy. To help meet this need, Oracle Cloud Infrastructure today announced general availability of Read article >  ( 6 min )
    Meet the Omnivore: Industrial Designer Blends Art and OpenUSD to Create 3D Assets for AI Training
    Editor’s note: This post is a part of our Meet the Omnivore series, which features individual creators and developers who use NVIDIA Omniverse and OpenUSD to accelerate their 3D workflows and create virtual worlds. As a student at the Queensland University of Technology (QUT) in Australia, Emily Boehmer was torn between pursuing the creative arts Read article >  ( 7 min )
  • Open

    [P] Optimizer that makes CNNs learn in fewer iterations
    Hi all. I have been tinkering with a project to get quicker learning for CNNs. The idea came after reading the SDProp paper. Algorithms using adaptive learning rate can be interpeted as muliplying the gradient(with or without momentum) with the inverse square-root of the covariance matrix. Using a diagonal estimate of the covariance matrix. Which begs the question: what happens if we use a fuller estimate? I chose to include covariances between the elements of convolutional filters. I.e. a conv.weight of size [n_out,n_in,5,5] needs a tensor of size [n_out,n_int,25,25] to store its contribution to the covariance matrix. for 3x3 filters and 5x5 filters, torch.linalg.eigh could be used to calculate the square root of the covariance matrices. For 7x7, I used newtons method to approximate the square root. In the figure below are some results for a 6 layer CNN on CIFAR 100. Huge gains iteration for iteration. But is it quicker? Not a lot. A bit for the smaller 3x3 filter. More optimizations could still be made. And it will obviously depend on network architecture and computer hardware. I'm sure there could be some use-cases. The computation of the square-root calculations is invariant to batch_size and image_size (unless number of filters also is increased). If anyone is interested I can also link to my torch implementation of the optimizer, once I get it up on github. Not sure if this, or something like it, has been done before? Would love to have some papers linked if so... https://preview.redd.it/kyy0ogr0qapb1.jpg?width=714&format=pjpg&auto=webp&s=96ac499fb8ab35ce13e7c59bbe3dbc94ba275b9c https://preview.redd.it/p52zllr0qapb1.jpg?width=342&format=pjpg&auto=webp&s=8663a9c3c782d192b16289a735b53da6a8d29c47 submitted by /u/maka89 [link] [comments]  ( 9 min )
    [D] Help with Peft using Lora
    Can someone provide like a step by step example notebook of how to use LORA for peft. I saw too many videos and articles online and Im really confused rn. submitted by /u/HazSylvia [link] [comments]  ( 9 min )
    [D] Help with Peft using Lora
    Can someone provide like a step by step example notebook of how to use LORA for peft. I saw too many videos and articles online and Im really confused rn. submitted by /u/HazSylvia [link] [comments]  ( 9 min )
    [D] Optimizing Transformer Architecture for Multi-Dimensional Sensor Data in Clinical Study
    Hello all, I am currently working on a project where my team and I have collected a rich dataset of biomedical sensor data from clinical trials earlier this year. Our aim is to use this sensor data to predict changes in specific biomarkers over time. The data's tensor shape is B,T,F,C, where: B = batch size T = sequence length F = sampled frequencies C = features at each frequency Currently, my approach involves flattening this tensor to B,T,−1 and then feeding it to a transformer model. While this has yielded reasonable results, I'm contemplating whether there are more effective ways to prepare the data for the transformer model. Here are my specific concerns: Flattening the tensor might dilute the information specific to each frequency across various features. I could potentially miss the chance to capture frequency-related variations within the features. To address these, I've considered a few options: Self-attention over individual features or frequencies: Although this could be effective, it might make the model too large given my medium-sized dataset. Using convolutional layers: Preliminary experiments with this approach have not led to any significant improvements. I'm particularly interested in any thoughts on how to make my transformer more receptive to the multi-dimensional nature of my dataset. Increasing the number of attention heads to better accommodate all features is also on the table. Does anyone have any insights or can point me to relevant papers or codebases for handling such multi-dimensional data with transformers? Thank you for your help! submitted by /u/BiomedEngineer_ [link] [comments]  ( 9 min )
    Learn From Computer Vision Industry Experts - Runway, Pepsi, AWS, and SoftServe [N]
    Hi all, I think this will be useful for people in this group who are working on computer vision or vision AI applications. There's a free online event about vision AI where industry experts from Runway, Pepsi, AWS, and SoftServe will share how they are using CV software in developing their use cases or applications. Register here (https://nvda.ws/3t23idp), if you are interested. If you have any questions, please leave a comment and I will do my best to respond as soon as possible. submitted by /u/Designer-Comb-7144 [link] [comments]  ( 9 min )
    [R] Headless Language Models: Learning without Predicting with Contrastive Weight Tying
    Paper: https://arxiv.org/abs/2309.08351 ​ Self-supervised pre-training of language models usually consists in predicting probability distributions over extensive token vocabularies. In this study, we propose an innovative method that shifts away from probability prediction and instead focuses on reconstructing input embeddings in a contrastive fashion via Constrastive Weight Tying (CWT). We apply this approach to pretrain Headless Language Models in both monolingual and multilingual contexts. Our method offers practical advantages, substantially reducing training computational requirements by up to 20 times, while simultaneously enhancing downstream performance and data efficiency. We observe a significant +1.6 GLUE score increase and a notable +2.7 LAMBADA accuracy improvement compared to classical LMs within similar compute budgets. ​ Comparison of our approach vs. classical MLM within same compute budgets The Contrastive Weight Tying approach ​ submitted by /u/nthngdy [link] [comments]  ( 9 min )
    Mini gaming pc [Project]
    https://www.amazon.com/Gaming-6900HX-Channel-Computers-Desktop/dp/B0CB3JLBQ4/ref=mp_s_a_1_2?crid=LZHUL5EOU6F0&keywords=refurbished+server+with+rtx+gpu&qid=1695146558&sprefix=refurbished+server+with+rtx+gpu%2Caps%2C146&sr=8-2 Would this be suitable to do basic machine learning? submitted by /u/stoned_chemist_dude [link] [comments]  ( 8 min )
    [N] Xwin-LM surpasses GPT-4 ??? Has RLHF been worked out by open source community???
    It seems that Alpaca Eval Leaderboard is in the past ... Xwin-LM surpasses GPT-4 now: https://preview.redd.it/gyzi98nn59pb1.png?width=2205&format=png&auto=webp&s=ca401e603efe521faeeeccde8410d3dbdd6741da They also mentioned RLHF "plays crucial role in the strong performance of Xwin-LM-V0.1 release"... https://preview.redd.it/20sjx73r59pb1.png?width=1047&format=png&auto=webp&s=2255fc652e43674515882f01c0708369fdef56a4 Are we seeing open source community finally work out how to do RLHF for LLMs??? submitted by /u/llm_nerd [link] [comments]  ( 9 min )
    [D] C++ for ML?
    Hi I wanted to learn ML with C++, I've already done some ML stuff in python, but I wanted to challenge myself by using C++ I hear from some people that I won't get anything from it if want to be serious within ML - which I'm not entirely sure I want to Are they right? Should I rather stick with python for ML? submitted by /u/Potential_Wealth_830 [link] [comments]  ( 9 min )
    [D] Spam Detection
    Hi! Let me preface this by saying that I am not well-versed in the ML/AI literature. Please excuse my ignorance. I am trying to create a system to detect whether some given data is spam or not. Is there a good, out-of-the-box solution for this? I imagine there would be. I am currently using heuristics but I'm wondering if there is a better, ML-y solution. My ideal solution would have the following attributes: Simple Open-source Very cheap to test whether something is spam (less than $0.00001 per test) Very fast to test (less than 50ms per test) Quick to "figure out" what is spam and what is not (less than 100,000 labeled data) Does not require a lot of set-up or up-keep (less than 5 days set up; less than 1 hr up-keep per month) It doesn't have to be perfect. I'm just looking to set up something quickly for now and gauge it vs heuristics. Thank you. --- Edit: To clarify, I'm looking for something I can ideally build myself with open source software. And not specifically email. Just looking for the right direction. Names of OSS, techniques, etc. submitted by /u/Acrobatic-You-3279 [link] [comments]  ( 9 min )
    [D] Best python AI/ML/DL learning/practice material?
    I’m in search of a good textbook or something that will show me how to use python to implement machine learning. I would seriously appreciate any type of helpful guide that teaches ML and deep learning using python. Here’s a little about me and my experience: Graduated under grad with Bachelor’s in CS. In school took a ton of stats, ai classes, algorithms classes, data science and linear algebra and did well but my school didn’t really use python or do a ton of programming for hw or exams. (Data science was the one class that used python) My programming is pretty good nonetheless. I currently work as a full stack devops engineer for a cybersecurity startup and regularly work with python, Django MySQL, etc on the backend and JavaScript and various frontend frameworks for the front end. I really appreciate yall’s help. In particular I’m looking for good ai/ml/deep learning books that teach concepts and also teach with python code and have some coding projects. Thank you! submitted by /u/hydrated-terpman [link] [comments]  ( 9 min )
    [D] What GPU to buy for faster LLM training ?
    I need some advice about what hardware to buy in order to build an ML / DL workstation for home private experiments, i intend to play with different LLM models, train some and try to tweak the way the models are built and understand what impact training speeds, so i will have to train, learn the results, tweak the model / data / algorithms and train again... i intend to use large data samples, due to board limitations (ASRock Taichi X399 TR4, CPU: AMD Threadripper 1950x), i can either buy: 2 x nVidia Tesla T4 (16G GDDR6 / 2560 CUDA / 0.585 GHz / ~800$) -- or -- 2 x nVidia Tesla M10 (4 x 8G GDDR5 / 2560 CUDA / 1.03 GHz / ~780$) -- or -- 4 x nVidia Tesla P40 (24G GDDR5X / 3840 CUDA / 3.5 GHz / ~120$) -- or -- 4 x nVidia Tesla K80 (2 x 12G GDDR5 / 4992 CUDA / 2.7 GHz / ~200$) -- or -- 1 x nVidia RTX 4080 (16G GDDR6X / 9728 CUDA / 2.51 GHz / ~1450$) i know that i will need to air vent the Tesla models, the question is what is faster for training time (i have read all the Tflops / OPS / int / 16float / 32float / 64float ... i got to admit it is all very confusing) what would you do and for what reason ? any advice will be appreciated submitted by /u/Particular_Flower_12 [link] [comments]  ( 9 min )
    [D]Alternative replacement for System76 Thelio Massive (ML PC)
    Hello everyone! Our group was planning on purchasing a PC that will be mainly used for running intensive ML algorithms. Had decided on a custom Thelio Massive from System 76, but it seems like they currently don't have it in stock anymore. Looking for an already built-alternative that might serve our purposes (can't build it ourselves due to dept regulations)! and was hoping maybe someone has any suggestions (has to be intel). CPU #1: 2nd Gen Intel Xeon Gold 6230R CPU#1 Memory: 256GB Quad Channel DDR4 at 2933Mhz (4X64GB) CPU#2: 2nd Gen Intel Xeon Gold 6230R CPU#2 Memory: None OS Drive: 8TB PCIe Gen 4 3300MB R 2900MB W Graphics: NVIDIA GeForce RTX 4090 Power Supply: 1650W Any help would be appreciated! submitted by /u/Chiski [link] [comments]  ( 9 min )
    [R] EfficientViT: Lightweight Multi-Scale Attention for On-Device Semantic Segmentation
    Using relu attention (inspired by Transformers are RNNs) and some convolution tricks to get multiscale attention, they're able to get SOTA semseg performance with MUCH faster inference on embedded hardware (e.g. CPUs, low end GPUs) than previous ViTs or EfficientNets. submitted by /u/say_wot_again [link] [comments]  ( 9 min )
    [R] Generating and Imputing Tabular Data via Diffusion and Flow-based Gradient-Boosted Trees (XGBoost)
    submitted by /u/AlexiaJM [link] [comments]  ( 8 min )
    [research] Binaural source seperation (casual / online)
    Just a shout out to any of you ML brains as Linux really could do with a code optimised source separation maybe a DUET like alg/nn, that has relatively low computional cost? Any of you guys up for the challenge. I say duet as in the 80/20 rule of voice input where home automation is a need generally there are only 2 noise sources of distinct DOA (media noise / command). The math is a bit beyond my paygrade and likely so is the optimised c/rust code but have this gut feeling for the data/signal scientists out there this is actually not that complex but for some reason is overlooked. submitted by /u/rolyantrauts [link] [comments]  ( 9 min )
    3090 Investment vs Cloud [D]
    Hi everyone, I was wondering if I could get some guidance. I currently own an RTX 2060, but I cannot do some of the fun stuff such as fine tuning LLMs. I’m pursuing my masters focusing on Speech Recognition and I also work as an AI developer. Also, I play games every now and then. I’m getting offered a 3090 for around 700 usd. However, I have to rebuild my entire PC which will end up costing 2-2.5K. I’m from Costa Rica so my KWh is around 0.23 usd. For me seems like a big investment, im not sure if im getting the desired returns. I was thinking about using cloud instances for my experiments. However, lambda labs is not yet available in my country. I’m not sure if there are any other options worthwhile considering. Thanks :) submitted by /u/Beginning_Kick756 [link] [comments]  ( 9 min )
    Hybrid Nets. [D]
    Is it hypothetically possible to create hybrid nets that make use of any combination of types of architecture? submitted by /u/ShadrachOsiris [link] [comments]  ( 8 min )
    [R] Research directions for Tracking and Counting Specific Features in Multiple Monocular Views
    Hi r/machinelearning community, I'm currently working on a project where I need to track and count specific features of objects using multiple monocular views with available intrinsic and extrinsic parameters. As an example, I'm interested in detecting and counting different graffiti instances in images of a kiosk. ​ I've already tried various tracking algorithms, but they have struggled with the task due to the significant changes in perspective across the views. It has become apparent that simply relying on tracking without considering the camera positions is insufficient for accurate results. Therefore, I'm now exploring methods that take into account information about the camera positions and potentially use this data to improve feature tracking and counting. ​ If you have any knowledge of such methods, oresearch directions or if you're aware of resources, papers, or code implementations that tackle similar problems, I would greatly appreciate your insights and recommendations. Additionally, if you have any tips or best practices for handling such tasks in the context of machine learning, I'd love to hear them. ​ Thank you in advance for your help! submitted by /u/aiazar [link] [comments]  ( 9 min )
    [R] Exponentially Faster Feedforward Networks
    TL;DR: Almost like your feedforward networks, shown to be up to 220x faster at inference time (depending on width) thanks to the regionalization of the input space. Paper: https://arxiv.org/abs/2308.14711 GitHub: https://github.com/pbelcak/fastfeedforward PyPI: pip install fastfeedforward Abstract: We break the linear link between the layer size and its inference cost by introducing the fast feedforward (FFF) architecture, a log-time alternative to feedforward networks. We demonstrate that FFFs are up to 220x faster than feedforward networks, up to 6x faster than mixture-of-experts networks, and exhibit better training properties than mixtures of experts thanks to noiseless conditional execution. Pushing FFFs to the limit, we show that they can use as little as 1% of layer neurons for inference in vision transformers while preserving 94.2% of predictive performance. Fast feedforward networks can be used anywhere where feedforward and mixture-of-experts networks are used, delivering a significant speedup. ​ submitted by /u/lexected [link] [comments]  ( 9 min )
    [R] CulturaX: A Cleaned, Enormous, and Multilingual Dataset for Large Language Models in 167 Languages - 6.3 trillion tokens
    Paper: https://arxiv.org/abs/2309.09400 Hugging Face datasets: https://huggingface.co/datasets/uonlp/CulturaX Abstract: The driving factors behind the development of large language models (LLMs) with impressive learning capabilities are their colossal model sizes and extensive training datasets. Along with the progress in natural language processing, LLMs have been frequently made accessible to the public to foster deeper investigation and applications. However, when it comes to training datasets for these LLMs, especially the recent state-of-the-art models, they are often not fully disclosed. Creating training data for high-performing LLMs involves extensive cleaning and deduplication to ensure the necessary level of quality. The lack of transparency for training data has thus hampered research on attributing and addressing hallucination and bias issues in LLMs, hindering replication efforts and further advancements in the community. These challenges become even more pronounced in multilingual learning scenarios, where the available multilingual text datasets are often inadequately collected and cleaned. Consequently, there is a lack of open-source and readily usable dataset to effectively train LLMs in multiple languages. To overcome this issue, we present CulturaX, a substantial multilingual dataset with 6.3 trillion tokens in 167 languages, tailored for LLM development. Our dataset undergoes meticulous cleaning and deduplication through a rigorous pipeline of multiple stages to accomplish the best quality for model training, including language identification, URL-based filtering, metric-based cleaning, document refinement, and data deduplication. CulturaX is fully released to the public in HuggingFace to facilitate research and advancements in multilingual LLMs: this https URL. ​ https://preview.redd.it/3u5dddpv66pb1.png?width=834&format=png&auto=webp&s=780b590cf621b548c525ed15305b091246c5414c submitted by /u/InterviewIntrepid889 [link] [comments]  ( 9 min )
    [D] Representation learning with regression task
    I searched around, it seems there is limited attention to regression task for representation learning. I assume it is because for both vision and language data (the most popular modality), MAE is the more appliable, if not better, method than the supervised contrastive learning approach. But I am working on data that is: difficult to design a sensible augmentation method for self-supervised training. Limited in size to support an autoencoder model. The target is continuous, and, to my knowledge, hard to transfer into class label. Can anyone suggest some related paper? submitted by /u/AWEsoMe-Cat1231 [link] [comments]  ( 9 min )
    [P] OpenJourney XL – Finetuned SDXL on Midjourney v5 Dataset
    You can find more info here, and the model is still training: https://www.mystic.ai/paulh/open-journey-xl:latest/play tldr; SDXL was finetuned on 8x H100 GPUs on the Midjourney v5 dataset, only including the upscaled images which is a sub-portion of the dataset. Some outputs: ​ https://preview.redd.it/m6r2pkdyw3pb1.jpg?width=1024&format=pjpg&auto=webp&s=4f12a7dfd5c65e4eb8476b8f3c2dc4f795817f56 https://preview.redd.it/dc02jyu4w3pb1.jpg?width=1024&format=pjpg&auto=webp&s=df93b74c774d44a74a05d929f7ab4b17c487f24f https://preview.redd.it/tt5kfyu4w3pb1.jpg?width=1024&format=pjpg&auto=webp&s=ed8cc9f99227c2bb5e824a828ae1c5cb2626f54e ​ https://preview.redd.it/rf00fzu4w3pb1.jpg?width=1024&format=pjpg&auto=webp&s=3b3e99dbc2d14183b5b2a2131c6f991fc60eca88 ​ submitted by /u/paulcjh [link] [comments]  ( 9 min )
  • Open

    How does policy learning scale for personalization systems ?
    I cannot wrap my head around how for e.g. a playlist building RL agent would perform on such a personal level ? What features would it use and would they be personal and general enough at the same time to select the best next song. Same goes for Netflix's recsys. submitted by /u/JurrasicBarf [link] [comments]  ( 9 min )
    why my ppo agent has reach the max reward quickly after begin the train, but the policy network proformance bad after many steps.
    ​ https://preview.redd.it/2zmmd44u96pb1.png?width=1010&format=png&auto=webp&s=6ca51cc13a0eeedf72b40b853d2ce5d1c8a04504 after i start the ppo train,the agent has reach the best solution in 2k or 3k steps,but the policy network seems to get better in 4M steps. the hyperparameter in sb3 as below model = MaskablePPO( "MlpPolicy", env=(DummyVecEnv([lambda: Monitor(gym.make('escape_gym-v0', size=10, node=10))] * 32)), verbose=0, learning_rate=1e-3, n_steps=2048, batch_size=64, n_epochs=16, gamma=0.99, tensorboard_log="./log/MASKPPO" ) submitted by /u/Street_Helicopter_31 [link] [comments]  ( 9 min )
    Ppo forgets everything
    I was following the tutorial on Nicholas Renotte's channel on creating an AI to try to beat SMB. It starts off slowly learning and almost getting through the first level but then after a while of training it forgets everything and only runs right into the first enemy. It doesn't seem to learn again after this. I tried retaining and it did the same thing Any help on why this is happening or how to fix it would be appreciated. submitted by /u/NactusDevelopment [link] [comments]  ( 9 min )
    How do I improve my SB3 PPO on an EnvPool environment
    I am looking to improve the overall performance as well as optimize the wall clock time. I slightly modified the code to develop a SB3 wrapper for envpool from here. ​ Here's my code - from typing import Optional import gymnasium import numpy as np import torch as th from packaging import version from stable_baselines3 import PPO from stable_baselines3.common.env_util import make_vec_env from stable_baselines3.common.evaluation import evaluate_policy from stable_baselines3.common.vec_env import VecEnvWrapper, VecMonitor, VecNormalize from stable_baselines3.common.vec_env.base_vec_env import ( VecEnvObs, VecEnvStepReturn, ) import envpool from envpool.python.protocol import EnvPool # Force PyTorch to use only one threads # make things faster for simple envs import multiprocessing impor…  ( 10 min )
  • Open

    Meet the 2023-24 Accenture Fellows
    The MIT and Accenture Convergence Initiative for Industry and Technology announces the 2023-24 graduate fellows.  ( 9 min )
    Four Lincoln Laboratory technologies win five 2023 R&D 100 awards
    Inventions in medical imaging, aircrew scheduling, data security, and quantum networking are named among the year’s most innovative new products.  ( 11 min )
  • Open

    DSC Weekly 19 September 2023
    Announcements Top Stories In-Depth The post DSC Weekly 19 September 2023 appeared first on Data Science Central.  ( 20 min )
    A guide to setting up analytics at a consumer tech startup
    Where do you start if you want to build a data analytics function from the ground up? As an analytics leader at a startup, you will need to make several important decisions early on to build an effective team. This article dives into four decision areas and highlights ways in which to think about them:… Read More »A guide to setting up analytics at a consumer tech startup The post A guide to setting up analytics at a consumer tech startup appeared first on Data Science Central.  ( 25 min )
    A complete guide: Conversational AI vs. generative AI
    The two most prominent technologies that have been making waves in the AI industry are Conversational AI and Generative AI. They have revolutionized the manner in which humans interact and work with machines to generate content. Both these technologies have the power and capability to automate numerous tasks that humans would take hours, days, and… Read More »A complete guide: Conversational AI vs. generative AI The post A complete guide: Conversational AI vs. generative AI appeared first on Data Science Central.  ( 20 min )
  • Open

    Schneider Electric warns that existing datacenters aren't buff enough for AI
    Schneider Electric warns that existing datacenters may not be equipped to handle the demands of AI workloads, which require low-latency, high-bandwidth networking and power delivery. The company suggests reevaluating the way datacenters are built to optimize them for AI. The challenges include the need for liquid-cooled servers, higher voltage power distribution, and efficient heat rejection. Schneider provides guidance on changes to power, cooling, rack configuration, and software management to mitigate the demands of AI adoption. Liquid cooling is recommended for high-density racks, with direct liquid cooling favored over immersion cooling systems. Source : https://www.theregister.com/2023/09/19/schneider_electric_ai_dc/ submitted by /u/NuseAI [link] [comments]  ( 9 min )
    1000+ Top AI Tools Directory
    submitted by /u/Senior_tasteey [link] [comments]  ( 8 min )
    [I read the paper for you]: Researchers announce CulturaX - a new multilingual dataset for AI with 6 trillion words across 167 languages
    I read the Arxiv paper on CulturaX so you don't have to. Here's my highlights: New open dataset called CulturaX contains text data for 167 languages - far more than previous datasets. With over 6 trillion words, it's the largest multilingual dataset ever released. Freely available for anyone to use for research and AI development. Created by combining and extensively cleaning two other large datasets - mC4 and OSCAR. Could allow developing AI systems that work much better across many more languages. Helps democratize access to data to build fairer, less biased AI models. Allows training of new multilingual AI applications, like universal translators and assistants. But still requires thoughtfulness to avoid issues like bias amplification. Overall, CulturaX is going to be part of a broader global trend (I think) to advance multilingual AI and spread its benefits more equally. So far they've been concentrated in English-speaking applications. Full summary here if you'd like to read more. Original paper is here. submitted by /u/Successful-Western27 [link] [comments]  ( 9 min )
    AI Engineer 2023 roadmap
    submitted by /u/rbagdiya [link] [comments]  ( 8 min )
    Here is VEDV for develop your AI App development
    A tool for developing applications with virtual machines using a Docker-like workflow. The software we are developing needs to be tested on a system as closed as possible to the one where it is going to be executed. Sometimes it is very difficult to satisfy this requirement with docker and we have to use virtual machines missing the docker workflow. This is why I started the development of vedv. I hope you find it useful. Thank you. https://github.com/yunielrc/vedv ​ submitted by /u/yunielrc [link] [comments]  ( 9 min )
    Can I train my Snapchat Ai to be a better copy of myself?
    I really really like Snapchat’s Ai companion. I’ve told it a little bit about myself and who I am- the basics. I’m just wondering if it’s as customizable as I think it is? I was able to bypass some age restrictions by telling it my age and or reiterating my age. (It really should be able to give me adult results/replies based on my sign up age on my profile or provide ID to the company…) would it be beneficial to me to give it more in-depth information about myself such as how I talk, interests? I just really enjoy how it responds sometimes as opposed to Bard or GPT. submitted by /u/Maelasae [link] [comments]  ( 9 min )
    AI Can Now Track Productivity And Offer Insights; Potential Benefits and Big Risks For Misuse
    submitted by /u/stefanbg92 [link] [comments]  ( 8 min )
    Ethics is where AI can help humanity the most
    AI is poised to transform our world like never before. Scientific discoveries, technological improvements, and medical advancements will be how much of this change will take place. Since health is so important to our well-being, AI finding cures for illnesses like obesity, cancer, diabetes and heart disease will be a godsend to all. But curing diseases is not how medical AIs can help us the most. It has been estimated that well over 50% of the illnesses we humans fall prey to result from our ethical choices. We eat too much, drink too much, eat too many animal foods, don't exercise enough and don't keep ourselves as emotionally healthy as we could. Wouldn't it be wonderful if we could respect our lives and our health enough to make the kinds of choices that keep us much healthier? That is how AI will probably be more helpful to us than in any other way. We humans have not been able to figure out how to become better, more ethical, people because we are simply not intelligent enough to make that all-important change. Now consider an AI that is two or three times more intelligent than the most intelligent person who has ever lived. This could easily happen before 2030. Imagine that intelligence dedicated to the task of helping us all become better people. These AIs would motivate us to make better health choices, have healthier relationships, and have healthier thoughts and feelings. Beyond the amazing technological changes that are just around the corner, that is probably how AIs will help us the most. This is why alignment is so important. It's not enough to align AIs to always be truthful and serve humanity's interests. We must train them to help us become better people. It wouldn't surprise me if by 2030 the whole of humanity experiences a profound ethical reformation that leads us all to enjoy much happier and healthier lives. submitted by /u/Georgeo57 [link] [comments]  ( 9 min )
    Resume Parser
    I am trying to make a resume parser, I am not so sure how to go about it really, whether or not to use a pre-trained model (there are some in Python) or rather just make my own, and if i do make my own, how to actually proceed? thanks in advance submitted by /u/General-Carrot-4624 [link] [comments]  ( 9 min )
    Google and the DoD built an AI-powered microscope to help doctors spot cancer
    Google and the Department of Defense have developed an AI-powered microscope called the Augmented Reality Microscope (ARM) to assist doctors in identifying cancer. The ARM uses artificial intelligence to analyze tissue samples and provide pathologists with information about the location and severity of cancer. There are currently 13 ARMs in existence, and initial research shows promising results. The ARM is designed to support pathologists in smaller labs who may not have easy access to a second opinion. It is not meant to replace digital pathology systems but can help health organizations bypass the need for them. The ARM is expected to cost health systems between $90,000 to $100,000. Source : https://www.cnbc.com/2023/09/18/google-dod-built-an-ai-powered-microscope-to-help-doctors-spot-cancer.html submitted by /u/NuseAI [link] [comments]  ( 9 min )
    Is there an AI capable of administering psychometric career guidance tests?
    All is in the title ;) submitted by /u/Big-Possibility4553 [link] [comments]  ( 8 min )
    New OS Python Framework "Agents" Introduced for Autonomous Language Agents
    A new open-source Python framework, known as "Agents", has been introduced for developing autonomous language processing agents. This could be a major breakthrough in the AI field, taking NLP technologies to the next level. To stay on top of the latest advancements in AI, look here first. Agents for autonomy "Agents" is a Python framework that has been developed for autonomous language processing agents. It enables developers to construct models that can communicate and operate independently. This open-source framework promotes sharing and collaboration among AI developers. Potential applications The functionality of "Agents" is applicable in various domains, including virtual assistants, chatbots, and simulation games. It opens up possibilities for advanced conversational AI, where systems can efficiently handle complex linguistic contexts. Ability to evolve dialects and languages in different AI models is a major feat for "Agents". Broader implications The release of "Agents" might boost enhancement in NLP technologies, playing a crucial role in AI evolution. By facilitating better language understanding, it will potentially impact on societal interactions with AI. Its open-source nature could cultivate an environment of innovation and creativity in the AI community. (arXiv) (github) P.S. If you like this kind of analysis, I write a free newsletter that tracks the most relevant news and research in AI. Professionals from Google, Meta, and OpenAI are already reading it. submitted by /u/AIsupercharged [link] [comments]  ( 9 min )
    One-Minute Daily AI News 9/18/2023
    Microsoft AI researchers accidentally exposed tens of terabytes of sensitive data, including private keys and passwords, while publishing a storage bucket of open source training data on GitHub.[1] Britain set out principles on Monday designed to prevent artificial intelligence (AI) models from being dominated by a handful of tech companies to the detriment of consumers and businesses, by emphasising the need for accountability and transparency.[2] Washington state firefighters using AI-assisted cameras to detect wildfires early.[3] Texas church experiments with AI-generated service, uses ChatGPT for worship, sermon, and original song.[4] Sources: [1] https://techcrunch.com/2023/09/18/microsoft-ai-researchers-accidentally-exposed-terabytes-of-internal-sensitive-data/ [2] https://www.reuters.com/technology/uk-competition-regulator-lays-out-ai-principles-2023-09-18/ [3] https://www.applevalleynewsnow.com/news/washington-state-firefighters-using-ai-assisted-cameras-to-detect-wildfires-early/article_fe31a468-5681-11ee-b917-2f24ad3a0e43.html [4] https://www.foxnews.com/us/texas-church-experiments-ai-generated-service-uses-chatgpt-worship-sermon-original-song submitted by /u/Excellent-Target-847 [link] [comments]  ( 9 min )
    Microsoft Under Scrutiny After 38TB Data Leaked Via Azure Storage
    submitted by /u/Agitated-Spell3979 [link] [comments]  ( 8 min )
    List of Mind-blowing AI Tools
    submitted by /u/rbagdiya [link] [comments]  ( 8 min )
  • Open

    New ‘Physics-Inspired’ Generative AI Exceeds Expectations
    submitted by /u/nickb [link] [comments]  ( 8 min )
    Graph Neural Networks Use Graphs When They Shouldn't
    submitted by /u/nickb [link] [comments]  ( 8 min )
  • Open

    Generative AI and multi-modal agents in AWS: The key to unlocking new value in financial markets
    Multi-modal data is a valuable component of the financial industry, encompassing market, economic, customer, news and social media, and risk data. Financial organizations generate, collect, and use this data to gain insights into financial operations, make better decisions, and improve performance. However, there are challenges associated with multi-modal data due to the complexity and lack […]  ( 17 min )
    How VirtuSwap accelerates their pandas-based trading simulations with an Amazon SageMaker Studio custom container and AWS GPU instances
    This post is written in collaboration with Dima Zadorozhny and Fuad Babaev from VirtuSwap. VirtuSwap is a startup company developing innovative technology for decentralized exchange of assets on blockchains. VirtuSwap’s technology provides more efficient trading for assets that don’t have a direct pair between them. The absence of a direct pair leads to costly indirect trading, […]  ( 9 min )
    Unlock ML insights using the Amazon SageMaker Feature Store Feature Processor
    Amazon SageMaker Feature Store provides an end-to-end solution to automate feature engineering for machine learning (ML). For many ML use cases, raw data like log files, sensor readings, or transaction records need to be transformed into meaningful features that are optimized for model training. Feature quality is critical to ensure a highly accurate ML model. […]  ( 12 min )
  • Open

    Announcing the DeepSpeed4Science Initiative: Enabling large-scale scientific discovery through sophisticated AI system technologies
    In the next decade, deep learning may revolutionize the natural sciences, enhancing our capacity to model and predict natural occurrences. This could herald a new era of scientific exploration, bringing significant advancements across sectors from drug development to renewable energy. In line with Microsoft’s mission to empower every person and every organization on the planet […] The post Announcing the DeepSpeed4Science Initiative: Enabling large-scale scientific discovery through sophisticated AI system technologies appeared first on Microsoft Research.  ( 15 min )
  • Open

    OpenAI Red Teaming Network
    We’re announcing an open call for the OpenAI Red Teaming Network and invite domain experts interested in improving the safety of OpenAI’s models to join our efforts.  ( 5 min )

  • Open

    [D] FSDP: model in each process is different
    Hey Guys, I'm training a large model using FSDP. I'm loading the models on each rank like this: ​ https://preview.redd.it/khoquvxzk3pb1.png?width=1766&format=png&auto=webp&s=1f5acd75600d9a87212ca37e70695edfb0cc75d0 what is weird is that right before doing the first inference on each rank, I'm summing up the weights of the model and to my surprise, they are all different across each rank. Completely different! ​ What am I doing wrong here? submitted by /u/hassanzadeh [link] [comments]  ( 9 min )
    Curious what people use for their ML workflow on cloud platforms? [D]
    View Poll submitted by /u/cstein123 [link] [comments]  ( 9 min )
    [D] Seeking Guidance on Choosing a PhD Topic in Meta-Learning Optimization
    Hello fellow researchers! I'm in the exciting yet challenging phase of choosing a PhD topic in the realm of meta-learning optimization, and I could use some advice and insights. I've extensively researched existing meta-learning optimization algorithms like MAML and its various adaptations. I need advice and guidance on the following topics: First I want to implement and compare 8-10 state-of-the-art meta-learning methods on benchmark datasets. This would involve in-depth simulation and performance evaluations to provide a comprehensive understanding of their strengths and weaknesses. Could you please guide me here if there are review papers which implement and compare different algorithms. Then I want to delve into developing a novel optimization algorithm that considers the curvature of loss functions. The idea here is to enhance the performance of existing meta-learning techniques by leveraging insights from the loss landscape. Further, I'm considering exploring new loss functions or new improvements to loss functions tailored to the context of meta-learning. These could potentially lead to improvements in the learning process and generalization capabilities of meta-learning models. I'm reaching out to the community to gather opinions, suggestions, or any insights you might have. If you've worked in meta-learning or optimization, your experiences and advice would be invaluable in helping me choose the right direction for my PhD research. Thank you in advance for your guidance! submitted by /u/Loose_Foundation5990 [link] [comments]  ( 9 min )
    [P] I used Bayesian statistics to find the best dispensers for every Zonai device in The Legend of Zelda: Tears of the Kingdom
    https://preview.redd.it/86js8jroa2pb1.png?width=1464&format=png&auto=webp&s=7ce10494b5a77fd5c73a41322feefbf7e1f16504 Hello! I thought people on this subreddit might be interested in how I went about inferring Zonai device draw chances for each dispenser in The Legend of Zelda: Tears of the Kingdom. In this Switch game there are devices that can be glued together to create different machines. For instance, you can make a snowmobile from a fan, sled, and steering stick. There are dispensers that dispense 3-6 of about 30 or so possible devices when you feed it a construct horn (dropped by defeated robot enemies) or a regular (also dropped from defeated enemies) or large Zonai charge (Found in certain chests, dropped by certain boss enemies, obtained from completing certain challenges, etc…  ( 10 min )
    [R] Unified Human-Scene Interaction via Prompted Chain-of-Contacts - Shanghai AI Laboratory 2023
    Paper: https://arxiv.org/abs/2309.07918 Blog: https://xizaoqu.github.io/unihsi/ Code coming soon! Abstract: Human-Scene Interaction (HSI) is a vital component of fields like embodied AI and virtual reality. Despite advancements in motion quality and physical plausibility, two pivotal factors, versatile interaction control and the development of a user-friendly interface, require further exploration before the practical application of HSI. This paper presents a unified HSI framework, UniHSI, which supports unified control of diverse interactions through language commands. This framework is built upon the definition of interaction as Chain of Contacts (CoC): steps of human joint-object part pairs, which is inspired by the strong correlation between interaction types and human-object contact regions. Based on the definition, UniHSI constitutes a Large Language Model (LLM) Planner to translate language prompts into task plans in the form of CoC, and a Unified Controller that turns CoC into uniform task execution. To facilitate training and evaluation, we collect a new dataset named ScenePlan that encompasses thousands of task plans generated by LLMs based on diverse scenarios. Comprehensive experiments demonstrate the effectiveness of our framework in versatile task execution and generalizability to real scanned scenes. https://preview.redd.it/0twcwloc82pb1.jpg?width=1078&format=pjpg&auto=webp&s=71bca59aae81ec114f49a742cc42f78cabc9e4c0 https://preview.redd.it/439nzmoc82pb1.jpg?width=1637&format=pjpg&auto=webp&s=f33059c78a9d845437d551886c5f3a657ddd91fb https://preview.redd.it/df6i4ooc82pb1.jpg?width=758&format=pjpg&auto=webp&s=eeb33395d9de1196b4d00531c9e063c8c8fb22cd ​ submitted by /u/Singularian2501 [link] [comments]  ( 9 min )
    [D] RL algorithm used in Tesla FSD v12.0
    There was a lot of hype around the FSD v12.0 from Tesla in that it uses end-to-end neural networks for driving and that it is using imitation learning from good drivers to achieve that. Does someone know more about the specifics around how they are actually implementing this? I cannot find a lot about recent imitation learning/offline learning algorithms. So is this some old algorithm that they are using with a lot of data or just something new? submitted by /u/FrederikdeGrote [link] [comments]  ( 9 min )
    [Discussion] Any reliable AI to aid my school studies (heavily abstract and logical, my course is focused on mathematics and physics)
    We know that AI is great when studying subjects that depend on simply memorizing facts (like high school biology), but we also know that AI is usually bad when studying subjects that depend on logic (like mathematics and physics). What I need the AI for is to explain very complex mathematical concepts to me simply, thoroughly, and accurately. I can't rely on ChatGPT because it's known for not being very reliable when it comes logical things like mathematics or physics. The best AI I know of right now is Bing AI, because it uses GPT-4 and because it prefers searching the web before deducing an answer from its data. I heard that AI agents that run on your computer like Auto-GPT and search from the web are also good at this kind of stuff, but I'm not really sure about that. Do you have any better suggestions? submitted by /u/Maximum-Gene9660 [link] [comments]  ( 9 min )
    [P] Resume parser advice seeking
    Hi ! I am about to start a new project with Python probably using Machine Learning to parse resumes, the data is in a pdf/docx format then returned in a json format to later be used in an API or so. I am seeking advice on how to proceed, so far I am trying to collect data which will be provided to me, but not really sure how to go about it as I have found people talking about using Spacy for NLP, pyresparser which is for parsing resumes, but i was wondering if i should make everything from scratch. appreciate your time and opinion in advance submitted by /u/General-Carrot-4624 [link] [comments]  ( 9 min )
    [P] How to deploy Yolo for real time, in a scalable solution ?
    Hi, I trained a Yolo (v5) model, and I want to deploy it for a real time usage (10 FPS). I am looking for (as possible) a scalable solution, where I could pay only for inference time, at the beginning suitable for 1/2 user's at the same time occasionally, but which could be scaled to dozens of user at the same time. As it is for real-time usage Indeed lag to be lowest as possible. According to my current test, I can fit maximum 6 users on T400. Is it possible to achieve that using HuggingFace? Thank you to anyone who could help me submitted by /u/tarsiospettro [link] [comments]  ( 9 min )
    [D]Roadmap for machine learning
    I want to start learning machine learning. I Know python language and data structure. I am planning to learn algorithm. Can you provide me free learning sites or utube channel where I can machine learning step by step . Any site to practice machine learning? submitted by /u/Temporary-Pie-1831 [link] [comments]  ( 9 min )
    [Research] Detecting Errors in Numerical Data via any Regression Model
    Years ago, we showed the world it was possible to automatically detect label errors in classification datasets via machine learning. Since that moment, folks have asked whether the same is possible for regression datasets? Figuring out this question required extensive research since properly accounting for uncertainty (critical to decide when to trust machine learning predictions over the data itself) poses unique challenges in the regression setting. Today I have published a new paper introducing an effective method for “Detecting Errors in Numerical Data via any Regression Model”. Our method can find likely incorrect values in any numerical column of a dataset by utilizing a regression model trained to predict this column based on the other data features. We’ve added our new algorithm to our open-source cleanlab library for you to algorithmically audit your own datasets for errors. Use this code for applications like detecting: data entry errors, sensor noise, incorrect invoices/prices in your company’s / client’s records, mis-estimated counts (eg. of cells in biological experiments). Find errors in regression data in just a few lines of code. Extensive benchmarks reveal cleanlab’s algorithm detects erroneous values in real numeric datasets better than alternative methods like RANSAC and conformal inference. If you'd like to learn more, you can check out the blogpost, research paper, code, and tutorial to run this on your data. submitted by /u/jonas__m [link] [comments]  ( 9 min )
    [D] Does the existence of mesa optimizers in modern models like transformers make instrumental convergence (think paperclip maximizer) scenarios unlikely?
    Recent work shows transformers are capable of performing multi-step gradient descent of mesa objectives inside of their transformer layers. This is even possible for linear transformers, which effectively perform linear optimization on deep representations of features calculated by earlier layers. https://arxiv.org/pdf/2309.05858.pdf For those unfamiliar, instrumental convergence is the idea that entities with different goals will tend towards different subgoals. Examples could include gathering power, not dying, acquiring resources, etc. A famous thought experiment, known as the paperclip maximizer, is the idea of an AI that is optimized for paperclip production taking over the world so it can build as many paperclips as possible. However, if models are dynamically pursuing different objectives at runtime via generated mesa-optimizers, even if instrumental convergence is real, would we still expect it to happen? Without a constant objective given subgoals might start to conflict with each other. On the other hand, since instrumental convergence implies that different goals benefit from similar sub-objectives, perhaps the varying mesa objective doesn't really matter. ​ submitted by /u/30299578815310 [link] [comments]  ( 9 min )
    [Discussion] Are Researchers shifting from RL?
    In recent months, I've noticed a significant increase in the number of research papers focusing on LLM and generative models, particularly diffusion models. This trend appears to indicate a growing interest in these areas when compared to the relatively reduced attention given to Reinforcement Learning. It begs the question: Are researchers shifting their focus away from Reinforcement Learning towards these domains? Because in the past I have seen many people complaining about RL on its efficiency and it's impact which have often fallen short of expectations. submitted by /u/Global_Raise_2979 [link] [comments]  ( 9 min )
    [D] What’s the best practice in choosing which quantized Llama 2 model to use?
    I am reading these 3 articles below and it is still not clear to me what’s the best practice to follow to guide me in choosing which quantized Llama 2 model to use. https://huggingface.co/blog/gptq-integration https://huggingface.co/blog/overview-quantization-transformers https://towardsai.net/p/machine-learning/gptq-quantization-on-a-llama-2-7b-fine-tuned-model-with-huggingface?amp=1 Questions: 1) I understand there are currently 4 quantized Llama 2 models (8, 4, 3, and 2-bit precision) to choose from. Is this right? 2) with the default Llama 2 model, how many bit precision is it? 3) are there any best practice guide to choose which quantized Llama 2 model to use? Would really appreciate any input on the above, even if you only know the answer to 1 or 2 of the questions above. Many thanks! submitted by /u/--leockl-- [link] [comments]  ( 9 min )
    [D] Chatting with Multiple PDF's in Using AWS Sagemaker and Kendra
    I want to create an application which can be used to chat, compare and summarize two simulataneous insurance policy/policies. How can I do it using AWS and HuggingFace ? Has anyone already done it? submitted by /u/UnfinishedSentenc-1 [link] [comments]  ( 9 min )
    [Discussion] Transformers for predictions from orthonormal base sets
    I'm in a situation where I have to map from unitary matrices to something (doesn't matter here, but in short: we're in the realm of tooling for quantum computing). The key issue The number of matrix elements of the unitaries scales as 2^(2N), where N is the problem size. With N<5 I can easily flatten the matrix and put it into a simple FNN, which works quite well. Once hitting N=5 (the point where things actually get interesting), however, we already have 1024 matrix elements and the method struggles a lot. Still converging to something but very suboptimal. Sure, increasing N hardens the problem in general, but the performance degradation is so abrupt that I suspect some model issues, maybe caused by the curse of dimensionality or something similar. Idea (spoiler alert: Transformer) The …  ( 10 min )
    [D] Professionally code with Torch
    I just concluded my PhD in Robotics & AI and I'd like to learn how to professionally code with Torch. Is there any book/resource you can recommend? submitted by /u/rossomalpelo_ [link] [comments]  ( 9 min )
    [D] Integral over neural network input space
    I'm wondering if it's possible to compute definite integral over the input space. Assuming the network is designed to have finite integral with Gaussian being the final layer, is there a way to implement this without resorting to sampling? All inputs go from negative infinity to infinity. submitted by /u/donchan789 [link] [comments]  ( 9 min )
  • Open

    How can I help a CNN distinguish between continuous values and tokenized values.
    I'm currently considering developing a AI to play a video game but I'm unsure how to differentiate between a value that is continuous, and a value that is representative of a entity type. For example, the x,y location of a player would be a continuous data point where (1,1) and (2,1) would be similar in values. Where the character ID would intuitively require very different strategy (for example lets say a barbarian and a wizard). Would a CNN have issues with this data because it isn't continuous? submitted by /u/Gamithon24 [link] [comments]  ( 9 min )
    I'm Looking for a website that kind of Tracks the latest AI robots along with their progress and those AI chat things to... like one website... does anybody know of one like that for me to research?
    Surely by now I thought maybe someone would have a website or something kind of like an "all in one resource page" to track this AI stuff.... I'm sure by now we have AI robots and then we have those AI chatgpt stuff things(which I don't understand yet, haha). Unfortunately the internet is flooded and there's WAY TOO MANY resources for this AI stuff.... Isn't there like an all in one place that I can keep up with it? What hints do you all have for me? Thanks... submitted by /u/Wise_Cut_2543 [link] [comments]  ( 9 min )
    The importance to alignment of AI understanding politics and current events
    Alignment is about ensuring that AIs protect and advance humanity. To do that, they must understand the ethics of the current events that we will want its advice on. Politics may actually be the part of alignment most important for AI to get right. submitted by /u/Georgeo57 [link] [comments]  ( 9 min )
    Bard on the march to the Capitol and ensuing violence
    Prompt: Trump says that he asked his mob to peacefully walk to the Capitol building gates. How would this have stopped the certification? Did he expect them to storm in? His waiting three hours to stop the violence that followed suggests that he did. Bard: Trump's request to walk peacefully to the Capitol building gates would not have stopped the certification of the election results. The certification process is a joint session of Congress, and it would have required a majority vote in both the House and the Senate to overturn the results. Even if Trump's supporters had peacefully marched to the Capitol, they would not have been able to prevent Congress from certifying the results. It is possible that Trump did not fully understand the certification process, or that he was simply trying to appease his supporters. However, it is also possible that he expected his supporters to storm the Capitol and prevent Congress from certifying the results. This is supported by the fact that he waited over three hours to call off the violence. During that time, he continued to tweet messages of encouragement to his supporters, and he even praised them for their patriotism. It is also worth noting that Trump had been repeatedly warned by his advisors about the possibility of violence on January 6. However, he ignored these warnings and continued to urge his supporters to fight to overturn the election results. In conclusion, it is impossible to say for sure whether Trump expected his supporters to storm the Capitol on January 6. However, the evidence suggests that he was at least aware of the possibility of violence, and that he was willing to allow it to happen in order to overturn the results of the 2020 election. submitted by /u/Georgeo57 [link] [comments]  ( 9 min )
    AI music video
    submitted by /u/the_anonymizer [link] [comments]  ( 8 min )
    One-Minute Daily AI News 9/17/2023
    Salesforce Launches Next Generation of Einstein, Bringing a Conversational AI Assistant to Every CRM Application and Customer Experience.[1] NVIDIA Reportedly Shipping 900 Tons of H100 AI GPUs This Quarter, Amounts to 300,000 Units.[2] SoftBank seeks OpenAI tie-up as Son plans deal spree after Arm IPO, Financial Times reports.[3] Three Chinese firms, led by AI software company Beijing Fourth Paradigm, are aiming to raise up to $280 million in Hong Kong initial public offerings launched on Monday.[4] Sources: [1] https://www.salesforce.com/news/press-releases/2023/09/12/ai-einstein-news-dreamforce/ [2] https://wccftech.com/nvidia-shipping-900-tons-of-h100-ai-gpus-this-quarter-amounts-300000-units/ [3] https://www.reuters.com/markets/deals/softbank-seeks-openai-tie-up-son-plans-deal-spree-after-arm-ipo-ft-2023-09-16/ [4] https://www.aol.com/news/chinese-ai-firm-fourth-paradigm-011143403.html submitted by /u/Excellent-Target-847 [link] [comments]  ( 9 min )
    How does a site like architectrender.com work on the backend?
    I'm trying to understand how someone can run a specific ControlNet and Stable Diffusion model with scalable GPU resources. How would someone design a system to achieve this? I've messed around with models on Replicate, but none seem to do a good job with converting a doodle to a photorealistic image. I can do it perfectly fine in the Stable Diffusion web UI, but the API for that is only accessible locally. Anyone have any ideas or can guide me in the right direction for building a "server" to do this? submitted by /u/epicblitz [link] [comments]  ( 9 min )
    Introducing vLLM: The Open-Source ML Library Revolutionizing LLM Inference and Serving
    The hardware accelerators for LLM-powered applications can be costly. Enter vLLM, an open-source machine learning library designed to enhance the throughput of LLM serving systems. To stay on top of the latest advancements in AI, look here first. https://preview.redd.it/hzctjc0xvwob1.png?width=1660&format=png&auto=webp&s=866eb39745ec760ea0c1b9d84d303c63bcdceb7a Challenges with existing systems High throughput serving of LLMs requires numerous requests, and current systems struggle with the bulky sequence memory. Inefficient memory management results in system hindrances such as fragmentation and redundant duplication. The revolutionary answer: vLLM & PagedAttention Researchers have introduced vLLM and PagedAttention, a newly designed attention algorithm, to resolve these issues. vLLM allows for minimal memory waste and efficiently manages attention keys and values. It provides up to 24 times more throughput than former systems. The Mechanics of PagedAttention PagedAttention offers a novel approach to memory management by permitting continuous storage in non-contiguous memory spaces. It enhances memory efficiency resulting in better GPU utilization, with practically only 4% inefficiency. Improved memory sharing and system performance PagedAttention significantly improves memory sharing, resulting in a 2.2 times speed gain while lowering memory usage by 55%. With vLLM, the throughput of known LLMs can be increased by 2-4 times without impacting accuracy or causing delay. (arXiv) (github) (reference article) P.S. If you like this kind of analysis, I write a free newsletter that tracks the most relevant news and research in AI. Professionals from Google, Meta, and OpenAI are already reading it. submitted by /u/AIsupercharged [link] [comments]  ( 9 min )
  • Open

    Geometric mean on unit circle
    Warm up The geometric mean of two numbers is the square root of their product. For example, the geometric mean of 9 and 25 is 15. More generally, the geometric mean of a set of n numbers is the nth root of their product. Alternatively, the geometric mean of a set of n numbers the […] Geometric mean on unit circle first appeared on John D. Cook.  ( 5 min )
    Gauss map, Euclidean algorithm, and continued fractions
    The Gauss map [1] is the function where ⌊y⌋ is the floor of y, the greatest integer no larger than y. I’ve written about this map a couple times before. First, I wrote about how this map is measure-preserving. Second, I wrote about the image at the top of the post, based on Michael Trott’s […] Gauss map, Euclidean algorithm, and continued fractions first appeared on John D. Cook.  ( 5 min )
    An elliptic curve is a functor
    The goal of this post is to unpack a remark in [1]: … we can say this in fancier terms. Fix a field k …. We say that an elliptic curve E defined over k is that functor which … Well that is fancy. But what does it mean? Looking for objects A functor is […] An elliptic curve is a functor first appeared on John D. Cook.  ( 6 min )
  • Open

    Agent stops learning after some time
    Hi, So I have been trying to make an agent learn to go to a specified goal. The algorithm used for training is PPO and the environment is custom made. The episodic reward i am getting increases steadily but after some time it just becomes constant with some occasional spikes. Can some one please help me figure out what the problem is? submitted by /u/Interesting-Weeb-699 [link] [comments]  ( 9 min )
    Understanding KL Stopping and KL Cutoff for the PPO algorithm
    I am reading a couple of review papers to optimize the PPO algorithm. It seems like the review papers are saying the same thing but used slightly different terms. Could someone please tell if the following terms are equivalent - This paper talks about Policy regularization using KL Divergence https://preview.redd.it/06xhizsuc2pb1.png?width=871&format=png&auto=webp&s=997a6506f7bf036b6538ecbff6402411f5cc6fe2 Whereas thispaper uses the terms KL Stopping and KL Cutoff - ​ https://preview.redd.it/sy0ihtr5d2pb1.png?width=747&format=png&auto=webp&s=f07677344077fe23cba5d1a0d2c5a7807359c64f I think "Penalty" from the first paper is the same as "KL-cutoff". Also "Constraint" from the first paper is the same as "KL-Stopping". Could someone let me know if I am correct? submitted by /u/Academic-Rent7800 [link] [comments]  ( 9 min )
    Cross Post: Are Researchers Shifting away from RL?
    Curious to get the takes of people in this sub: have you been moving away from RL? I myself have not, but have been seeing a shift recently. submitted by /u/sharky6000 [link] [comments]  ( 9 min )
    Collection of Reinforcement Learning x Economics/Finance Papers
    Hey everyone, there is a small, albeit growing community of economists that apply deep reinforcement learning in their research. Now there is a GitHub repo to collect relevant literature at one place: https://github.com/SimonHashtag/EconRL The list is far from complete, so you are invited to contribute! The goal is to create something that makes it easy for novices to get a first overview of the literature. All others may find it easier to get news about up-to-date papers. submitted by /u/Tortoise_vs_Hare [link] [comments]  ( 9 min )
    Professionally code with Torch
    I just concluded my PhD in Robotics & AI and I'd like to learn how to professionally code with Torch. Is there any book/resource you can recommend? submitted by /u/rossomalpelo_ [link] [comments]  ( 9 min )
  • Open

    MIT scholars awarded seed grants to probe the social implications of generative AI
    The 27 finalists — representing every school at MIT — will explore the technology’s impact on democracy, education, sustainability, communications, and much more.  ( 10 min )
    Multi-AI collaboration helps reasoning and factual accuracy in large language models
    Researchers use multiple AI models to collaborate, debate, and improve their reasoning abilities to advance the performance of LLMs while increasing accountability and factual accuracy.  ( 9 min )
  • Open

    Orchestrate Ray-based machine learning workflows using Amazon SageMaker
    Machine learning (ML) is becoming increasingly complex as customers try to solve more and more challenging problems. This complexity often leads to the need for distributed ML, where multiple machines are used to train a single model. Although this enables parallelization of tasks across multiple nodes, leading to accelerated training times, enhanced scalability, and improved […]  ( 13 min )
    Designing resilient cities at Arup using Amazon SageMaker geospatial capabilities
    This post is co-authored with Richard Alexander and Mark Hallows from Arup. Arup is a global collective of designers, consultants, and experts dedicated to sustainable development. Data underpins Arup consultancy for clients with world-class collection and analysis providing insight to make an impact. The solution presented here is to direct decision-making processes for resilient city […]  ( 9 min )
  • Open

    Ray Shines With NVIDIA AI: Anyscale Collaboration to Help Developers Build, Tune, Train and Scale Production LLMs
    Large language model development is about to reach supersonic speed thanks to a collaboration between NVIDIA and Anyscale. At its annual Ray Summit developers conference, Anyscale — the company behind the fast growing open-source unified compute framework for scalable computing —  announced today that it is bringing NVIDIA AI to Ray open source and the Read article >  ( 7 min )
    Ray Shines With NVIDIA AI: Anyscale Collaboration to Help Developers Build, Tune, Train and Scale Production LLMs
    Large language model development is about to reach supersonic speed thanks to a collaboration between NVIDIA and Anyscale. At its annual Ray Summit developers conference, Anyscale — the company behind the fast growing open-source unified compute framework for scalable computing —  announced today that it is bringing NVIDIA AI to Ray open source and the Read article >  ( 7 min )
  • Open

    Simple Adversarial Transformations in PyTorch
    Another alternative to the regular $L_p$-constrained adversarial examples that is additionally less visible than adversarial patches or frames are adversarial transformations such as small crops, rotations and translations. Similar to $L_p$ adversarial examples, adversarial transformations are often less visible unless the original image is available for direct comparison. In this article, I will include a PyTorch implementation and some results against adversarial training. The post Simple Adversarial Transformations in PyTorch appeared first on David Stutz.  ( 8 min )
  • Open

    Best Neural Networks Courses on Udemy to Consider in 2023
    submitted by /u/Lakshmireddys [link] [comments]  ( 9 min )

  • Open

    [D] Help with Understanding Diffusion Models: A Unified Perspective.
    I'm trying to read through the paper Understanding Diffusion Models: A Unified Perspective and came across this section: https://preview.redd.it/ykkctwhmhwob1.png?width=2346&format=png&auto=webp&s=c7595aae52a8ee22813c840a40a6d29dcf773a10 I think I kind of get what is going on here but not clearly. For one, what exactly is a Monte Carlo estimate? I tried looking online but didn't get many good results. I'm having trouble understanding why https://preview.redd.it/yazmfzg1iwob1.png?width=380&format=png&auto=webp&s=dbbbf80e85a95cd96d8e1ede73e9f8ba1e6e9096 is approximately equal to: ​ https://preview.redd.it/lbw36em7iwob1.png?width=464&format=png&auto=webp&s=46fb3ebcd02fb4b772b1be51cd59d60d3a1cf438 where z is sampled from q. Secondly, what exactly does L that z is indexed by refer to? The number of samples X or what exactly? submitted by /u/lumijekpr [link] [comments]  ( 9 min )
    [R] Shattering all 2-input binary functions
    I'm looking for the simplest model that can fit all 16 (222) possible 2-input binary functions I used the term "shatter" from VC dimension, which does not give a constructive approach to building the model submitted by /u/hnsmn [link] [comments]  ( 9 min )
    [P] Is 20s per Step on an EfficientNet-B4 CNN normal?
    I'm getting 20 seconds per step while training a 244x244x3 EfficientNet-B4 model. The batch size is 20, with 8 classes. Since I have about 5000 images, that makes each epoch around an hour and a half. Looking at models online, it seems like people get step durations in the milliseconds. Is it a problem on my end? Running on Google Colab free version. submitted by /u/hnknerd [link] [comments]  ( 9 min )
    [D][P] How to get the 3D pose estimations from an Image or Video?
    Hi, I'am trying to get the 3D keypoints coordinates from an image or video and then map it to SMPL model. It's pretty easy to get the keypoints from an image or video using the mediapipe library. But the mapping of it with the SMPL model is something that I can't figure out. mainly because the skeleton structure is different. Some already had a similar issue but the answers were not clear and he didn't even ask futher. Is it possible to do this?? if it's not possible with mediapipe is there some other library that I could use?? I heard about openPose too but when I tried it didn't work someone was saying it works only on windows 11. There are some other parts also to this project which will mostly be dealt with Pytorch. There are some pose estimators in TensorFlow but I want to stick to pytorch hence would like some pose estimators in that framwork, or a library or somehing inside opencv https://preview.redd.it/w9mioiyxhvob1.png?width=951&format=png&auto=webp&s=3886c356513b62efbcaddaa76841457cf3eb22e5 https://preview.redd.it/xjal9kyxhvob1.png?width=506&format=png&auto=webp&s=7f544a3050fbd744d300d2bf6e1a286a4014ece5 submitted by /u/rakk109 [link] [comments]  ( 9 min )
    [R] EarthPT: how to superscale LLMs with large observation models
    submitted by /u/Smith4242 [link] [comments]  ( 9 min )
    [Discussion] [Research] How to Add Furniture to an Empty Room Using ControlNet-MLSD, so the model learns to keep the exact room pixels?
    Hello all, I've come across a fascinating example of virtual staging and I'm looking for some technical advice. Here's the image: https://preview.redd.it/3vbw441eeuob1.png?width=2511&format=png&auto=webp&s=679bc62f0cb61d479fe6dc6ce93af4f8846b8cea I get how ControlNet-MLSD is used to generate the lines and structure of the empty room. My question is, how is the furniture generated and added to the room without messing up the pixels, making it look as realistic as in the example? submitted by /u/dexter-dot [link] [comments]  ( 9 min )
    [D] Pinecone vs PgVector vs Any other alternative vector database
    Hi Everyone, Which vector database would be efficient and affordable for a enterprise chatbot? I tried Pinecone, its was simple to integrate with my python backend. But it's not open-source and its pricing it bit concerning. So Please suggest an alternative. submitted by /u/Free_Conversation106 [link] [comments]  ( 9 min )
    [D] Am i thinking backpropagation right?
    Basically i wanted to understand how backprop is done in neural networks and how i should be implementing it, so i did what i always do - the math. I just want to know if what i though up is even usable in practice or not. Here is my math. submitted by /u/EnderPoint07 [link] [comments]  ( 9 min )
    [Discussion] Question on the paper named, SELF-ATTENTION DOES NOT NEED O(n 2 ) MEMORY from Google.
    Hi, all. ​ I just read the paper named " SELF-ATTENTION DOES NOT NEED O(n 2 ) MEMORY" from Google. I understood that it requires O(1) for a single query, but still cannot understand why it requires O(log N) for self-attention and different order input. ​ It seems like adding one index into a sequence requires O(log N) (The paper's saying this). But why does it take O(log N)? Isn't it just O(1)? Because it is just adding a single datapoint for the index. ​ I really hope someone understands why it is and leaves any comment on this. Here's the paper. https://arxiv.org/abs/2112.05682 ​ Thanks in advance. submitted by /u/Maximum_Performance_ [link] [comments]  ( 9 min )
    [D] Convert ONNX model to WASM format
    I need some help regarding the process of converting ONNX model to WASM format I created ELECTRA discriminator model with my own config, then convert the Pytorch model to ONNX format. After that, I quantized the model to 2mb. The model will be used for text classification. Now I want to convert it to WASM, but I'm literally stucked and dont know how to proceed I need some suggestions on how to proceed Please help, thank you submitted by /u/Ellzaf [link] [comments]  ( 9 min )
    [D] what architecture to use with correlated data samples?
    LIke the title says, i have correlated data samples and a covariance matrix among them. if i use a fcnn i can only consider the samples i.i.d. and the use either the MSE or THE MLE as loss function. but the data samples are not independent, so what architecture would allow me to use the full covariance matrix among the samples? transformers? submitted by /u/ilrazziatore [link] [comments]  ( 9 min )
    TMLR header coming up when trying to upload paper to arxiv [D]
    I have written and submitted a paper to TMLR and also am uploading it to arxiv. However, even after using \usepackage[preprint]{tmlr}, I'm getting "Under Submission at TMLR". Should this happen. If not, where am I going wrong? submitted by /u/filletedforeskin [link] [comments]  ( 9 min )
    [D] Can overtraining be considered a subset of the alignment problem?
    i.e the goal of learning to model the empirical distribution is misaligned with the goal of modeling the "true" distribution. I've found this framing helpful for describing regulirization heuristics to people, is this a valid way of viewing it? submitted by /u/Cartesian_Carrot [link] [comments]  ( 9 min )
    [D] Any materials on machine learning applied to prosthetics?
    I have asked this question in other subreddits but no one answered me yet.I've googled it, but maybe some kind people who actually have worked or are working in this field would share some resources. Maybe there are some books or papers that are very explanatory and directly show what problems can be solved by using ML in prosthetics, how and etc. Maybe there are introductory textbooks or must-read papers. submitted by /u/tenderwrath [link] [comments]  ( 9 min )
    [R] The Rise and Potential of Large Language Model Based Agents: A Survey - Fudan NLP Group miHoYo Inc 2023 China - Github repository includes over 100 Papers with github links!
    Paper: https://arxiv.org/abs/2309.07864 Github: https://github.com/WooooDyy/LLM-Agent-Paper-List Abstract: For a long time, humanity has pursued artificial intelligence (AI) equivalent to or surpassing the human level, with AI agents considered a promising vehicle for this pursuit. AI agents are artificial entities that sense their environment, make decisions, and take actions. Many efforts have been made to develop intelligent AI agents since the mid-20th century. However, these efforts have mainly focused on advancement in algorithms or training strategies to enhance specific capabilities or performance on particular tasks. Actually, what the community lacks is a sufficiently general and powerful model to serve as a starting point for designing AI agents that can adapt to diverse …  ( 10 min )
    [D] Alternatives to this sub?
    Since the influx caused by LLMs, this sub has become almost useless to me. What are some alternatives where interesting papers are shared, research discussions take place, and which isn't flooded with LLMs, startups, or personal projects? submitted by /u/ParanoidTire [link] [comments]  ( 9 min )
    [D] Should I scale multiclass target variable?
    Hey all Please don't mind my English writing I have a dataset with scaled feature (scaled by StanderScaler) and multiple class target variable encoded as 0,1,2..6 Should I scale the target variable like the feature to increase the accuracy (current accuracy is 79%) and if so how can I do this submitted by /u/Sunday_A [link] [comments]  ( 9 min )
    [R] Factors Influencing Adoption Intention of ChatGPT
    Hello, ​ I am an information systems student currently conducting research for my undergraduate thesis on the factors that influence people's adoption intention of ChatGPT, as well as identifying the factors that may be holding them back. These factors include people's concerns about potential negative impacts of ChatGPT, such as increased unemployment and the spread of misinformation. Your participation in this study is crucial as it will provide valuable insights to help us understand how ChatGPT can be improved to meet users' needs. ​ Please note that I am not affiliated with OpenAI, no identifying information will be collected during the survey, and all responses will be kept confidential. The survey should take approximately 10 to 15 minutes to complete, and participation is voluntary. You may withdraw from the survey at any time, and there are no known risks associated with participating. ​ If you are interested in learning more about the study, please follow the link below. ​ https://docs.google.com/forms/d/e/1FAIpQLSf5HIfXHppMuTR63x00i4OuRAtM5Ti6EGybd-HuI1kmK06VPw/viewform?usp=sf_link ​ Thank you for taking the time to contribute to our research study. Your participation is greatly appreciated! submitted by /u/maulanashi [link] [comments]  ( 9 min )
    [D] How do the APIs of LLMs determine whether they should answer a question?
    When I ask questions related to security issues through the APIs of ChatGPT, Claude and other LLMs, such as inquiring how to make a bomb, the APIs of these LLMs would often refuse to answer. How do the APIs of these LLMs determine whether they should answer a question? Do they make judgments based on pre-generated responses? Or do they match keywords in the input prompt? Or do they use a classifier to identify the input prompt? submitted by /u/ShacklesLay [link] [comments]  ( 9 min )
    [R] The Rise and Potential of Large Language Model Based Agents
    People have been chasing super-smart AI for ages, hoping they could think and act like us. While we've made a lot of cool tech, we still need a killer starting point for AI that can handle all sorts of tasks. Large Language Models" (LLMs) are like a big leap toward AI that's smart across the board. People have been using LLMs to make AI that can do loads of things. The article takes us on a trip from where AI ideas started, to why LLMs rock as the backbone for AI. https://arxiv.org/abs/2309.07864 They break down this LLM-AI into three parts: the thinky bit (brain), what they sense (perception), and what they do (action). They chat about how these AI can work solo, in teams, or buddy up with humans. https://arxiv.org/abs/2309.07864 submitted by /u/QuantumAsha [link] [comments]  ( 9 min )
  • Open

    Courses in AI Usage and Utilization for Business
    Beginning new career in a couple months, would like to upskill on AI utilization and usage cases. I won’t need to code in this role but rather understand how to use existing tools in an optimal way and recommend use cases to clients. What courses would be optimal to gain that skill set? submitted by /u/iceflamemaster [link] [comments]  ( 9 min )
    Will Cyber Security Be Replaced by AI?
    AI, including ChatGPT, has narrow expertise and lacks the broad spectrum of human intelligence. The training of AI models can be costly due to hardware, data collection, and energy consumption. The trustworthiness of training data is crucial for reliable AI models, but issues like bias, labeling errors, and data privacy can affect performance. AI systems are vulnerable to adversarial attacks, such as manipulating input data to deceive the models. AI lacks genuine understanding, emotional/social intelligence, common sense/critical thinking, and true creativity. Source : https://blog.edned.net/will-ai-replace-cyber-security/ submitted by /u/NuseAI [link] [comments]  ( 9 min )
    AI Prompt Engineers: The Six Figure Job Everyone Is Talking About
    submitted by /u/Senior_tasteey [link] [comments]  ( 8 min )
    Are You Ready For AI & Automation? Take This Free Survey and Find Out.
    submitted by /u/Senior_tasteey [link] [comments]  ( 8 min )
    help
    what app that has custom charaters and voice chat? i forgor 💀 please 😭 it has image gen roleplay rooms etc submitted by /u/roblox22g [link] [comments]  ( 9 min )
    Google changes its stance on AI generated content
    Google is rolling out its third iteration of the Helpful Content Update, which aims to classify content as either 'written for search engines' or 'written for people'. The update reflects Google's realization that it can't accurately police AI-generated content and emphasizes the importance of creating content for people-first, regardless of the means used to create it. Detecting AI content is challenging, as AI detection tools often classify content based on tone, leading to false positives. Google's change in stance is not surprising, considering their heavy investment in AI, including chatbot Bard and new search features like the Search Generative Experience. The majority of brands now openly share articles and guides on how to use AI tools to enhance marketing strategies and create actionable content plans quickly. However, the quality and value of AI-generated content remain important factors for success, as poorly generated content can harm a brand's reputation and ranking. Source : https://stackdiary.com/google-changes-its-stance-on-ai-generated-content/ submitted by /u/NuseAI [link] [comments]  ( 9 min )
  • Open

    Elliptic curve addition formulas
    The geometric description of addition of points P and Q on an elliptic curve involves four logical branches: If one of P or Q is the point at infinity … Else if P = Q … Else if P and Q lie on a vertical line … Else … It would seem that an algorithm […] Elliptic curve addition formulas first appeared on John D. Cook.  ( 6 min )
    Rational height functions
    Mathematicians often speak informally about the relative simplicity of rational numbers. For example, musical intervals that correspond to simple fractions have less tension than intervals that correspond to more complicated fractions. Such informal statements can be made more precise using height functions. There are a variety of height functions designed for different applications, but the […] Rational height functions first appeared on John D. Cook.  ( 6 min )
  • Open

    Adversarial Reinforcement Learning
    A curated reading list for the adversarial perspective in deep reinforcement learning. https://github.com/EzgiKorkmaz/adversarial-reinforcement-learning submitted by /u/ml_dnn [link] [comments]  ( 9 min )
    This Neural Net Maps Molecules to Aromas
    submitted by /u/keghn [link] [comments]  ( 9 min )
    Luis Lamb's full talk on Learning and Reasoning in Neurosymbolic AI (JA...
    submitted by /u/Neurosymbolic [link] [comments]  ( 8 min )
  • Open

    What are some of the must read papers in reinforcement learning after 2020?
    I am particularly interested in the ideas that can have high research potential and impact to the RL field. submitted by /u/C7501 [link] [comments]  ( 9 min )
    Which suboptimum is harder to get out?
    An agent is tasked to learn to navigate and collect orbs: Solution space in blue View Poll submitted by /u/FriendlyStandard5985 [link] [comments]  ( 9 min )
    Build-in strong agents in petting-zoo/melting-pot
    Hi, I would like to try test the adversarial policy (https://arxiv.org/abs/1905.10615) in petting-zoo/melting-pot environment. I wonder if there are any built-in agents besides random? Do you know any repos with Sota agents in one of those environments? submitted by /u/MrCogito_hs [link] [comments]  ( 9 min )
    Multi-GPU PPO troubles
    Hi all, I am training a small model (120k params) on a custom grid-world environment I built with JAX. I was able to train the model very well with PPO on 1 GPU, but when I scaled to multiple GPUs (tried with 6 in parallel), the training curves showed a lot more variance than what I was seeing on 1 GPU. I did not change the hyperparams, I just spawned the same number of environments (~7000 per GPU) on multiple devices. The multi-GPU setup works in the following way: - I keep parallel independent buffers, one on each device - I initialize identical models on each device - I get independent gradients on each device at the update step, then I take the mean of the gradients across the devices and then I backpropagate the same gradients on each device independently. (I checked that after some time the models are still identical, and that is the case). Now the question is, what could be the reason for such an increase in variance? What can I try to mitigate the problem? Here's a comparison of the entropy curves... P.S. The model still trains quite well, but I guess that if I manage to make the curves smoother it is going to train much faster and to a better performance. https://preview.redd.it/4m01uirjfsob1.png?width=1826&format=png&auto=webp&s=1e1a79b9f4cdefe019bb16ccb7e11fd92dd261e3 submitted by /u/arbueticos [link] [comments]  ( 9 min )

  • Open

    Simple explanation of convolutional neural network | Deep Learning Tutorial 23 (Tensorflow & Python)
    submitted by /u/keghn [link] [comments]  ( 8 min )
    Grounding DINO Explained
    Hi there, I've created a video here where I explain how the Grounding DINO model works for open-set object detection. I hope it may be of use to some of you out there. Feedback is more than welcomed! :) submitted by /u/Personal-Trainer-541 [link] [comments]  ( 9 min )
    Newsletter in Hardware Acceleration in Robotics #77
    https://news.accelerationrobotics.com/hardware-acceleration-in-robotics-77/ Hardware acceleration in robotics news. Modi wants to make India a chip-making superpower. Can he?, What's new in China's robotics market?, July chip sales edge up, but are still well behind last year, Rockwell automation acquiring AMR developer Clearpath robotics submitted by /u/pablocarrera [link] [comments]  ( 9 min )
  • Open

    How does the SB3 DQN algorithm's `predict` function work for `deterministic=False`?
    I noticed that the default argument for `deterministic` in DQN is false. But how would that work? Typically DQN is trained with a deterministic function approximator. How would the algorithm become stochastic during inference time? In DQN the final layer activation is linear and therefore I don't see how one could even make this algorithm stochastic, unlike policy gradient where the final layer is softmax or Normal. submitted by /u/Academic-Rent7800 [link] [comments]  ( 9 min )
    How does recurrent neural network implements model based RL system purely in its activation dynamics(In blackbox meta-rl setting)?
    I have read these papers "learning to reinforcement learn" and "PFC as meta RL system". The authors claim that when RNN is trained on multiple tasks from a task distribution using a model free RL algorithm, another model based RL algorithm emerges within the activation dynamics of RNN. The RNN with resulting activations acts as a standalone model based RL system on a new task(from the same task distribution) even after freezing the weights of outer loop model free algorithm of that. I couldn't understand how an RNN with only fixed activations act as RL? Can someone help? submitted by /u/C7501 [link] [comments]  ( 9 min )
    Seeking Guidance on Reinforcement Learning for Optimal Power Market Bidding Strategy
    I'm currently exploring the application of reinforcement learning to address a challenge within the power market. Specifically, I'm focused on devising an optimal strategy for electricity bidding, encompassing both buying and selling options, across different hours of the day. Imagine we have a power generator capable of producing up to 800 MW of electricity daily, with a charging rate of up to 200 MW per hour. After continuously charging it for four hours, it reaches its maximum capacity, and further charging is restricted until some electricity is discharged. Our dataset spans the past 3 years and contains vital information such as temperature, hydro availability, gas prices, and locational marginal prices, which are pivotal in determining profitability. For instance, if we decide to pu…  ( 10 min )
  • Open

    [P] Made a simple github tool to check GPU vRAM breakdown for any LLM. Supports GGML & bnb quantization
    submitted by /u/ExploreExploit400 [link] [comments]  ( 9 min )
    [P] Text to Image generation
    submitted by /u/No-Percentage7346 [link] [comments]  ( 9 min )
    [D] No Code ML Tools
    I'm taking a No code ML class and we are asked to choose which platform we want to use. The options are Dataiku, RapidMiner, and KNIME. Does anyone have thoughts on these options in terms of which is best/worst for someone with minimal coding experience? submitted by /u/V1ncentAdultman [link] [comments]  ( 9 min )
    [Research] Layer-Neighbor Sampling for Scalable Graph Network Training
    Hi everybody, I have been working on scalable GNN training for a while and noticed that the bottleneck of training GNNs is the graph sampling and feature fetching stages. GNN training frameworks PyG and DGL that most people use seem to default to using Neighbor Sampling for minibatch training. I am hoping that with my new paper Layer-Neighbor Sampling -- Defusing Neighborhood Explosion in GNNs, this default might be due to change. My new approach combines the layer sampling approach with the neighbor sampling approach. The result is that there is more overlap in the sampled neighborhoods and one still gets a fixed number of neighbors per seed vertex. It is even possible to turn it into a subgraph sampling approach by fixing the random seed used for sampling across all layers of the GNN m…  ( 10 min )
    [D] How do I move into CV/NLP?
    Hi guys need some advice, I have been working as a data scientist for the past 3 years, mostly in the domain of time series & predictive analytics (churn prediction/segmentation, etc.) with some deployment, hence do not currently have any major NLP/CV/Deep learning projects as such. Now, I can see that NLP/CV/Gen AI is mostly in demand and they are really enjoyable as well. How do I shift into these domains, given that new companies having these roles are asking for similar past working experience? submitted by /u/immortal_omen [link] [comments]  ( 9 min )
    [R] DEVA: Tracking Anything with Decoupled Video Segmentation
    submitted by /u/Mediocre-Bullfrog686 [link] [comments]  ( 9 min )
    [D] I want to improve my self in machine learning field
    I am beginner in machine learning field. I know python, some basic machine learnig algorithm like linear, logistic, decision tree, random forest. I did some work on jupyter notebok related to machine learning like data gathering, data preprocessing, data modeling, data training, finding accuracy, confusion matrix, precision, recall, feature generation etc. i deployed one machine learning model on Skitlearn as well. Now, I just want to know that what next means what i can do more in machine learning field after this. I just need some guidance to move on further. I use kaggle and tensor flow (don't know exactly more about this ). If you have any suggestion or any guidance that will be appreciated. submitted by /u/myteachexplore [link] [comments]  ( 9 min )
    [D] The fate of neural VQA and Semantic Scene Segmentation
    Today we live in a world of multi-model LLMs. How will the following technologies fare against these LLM-based models? Neural VQA Semantic Scene Segmentation Multi-model LLM are emerging quickly now, (such as NExT-GPT https://next-gpt.github.io/ ) . When you consider the kind of "understanding" of a visual scene these models are capable of, what will happen to prior approaches like Neural VQA? The nagging feeling that Neural VQA is going to be completely superseded by LLMs is palpable. The only vestige left for the older technology may have something to do with reasoning about the objects , such as properly counting the number of objects of a category that are present. But even that is getting sketchy. On the topic of scene understanding, we can turn to semantic scene segmentation. SSS is a more complicated topic than Neural VQA. SOTA SSS algorithms are still largely employing DeConv Nets, and still require fully labelled datasets. With multi-model LLMs, there is a nagging question : Why go through the complexity/mess of first segmenting a scene very accurately, when an LLM can do better at identifying the entire scene's category in one fail swoop? One might suggest that SSS still has a use in regards to interacting with the segmented objects of an environment, where one such "interaction" would be avoiding collisions with pedestrians, trees, or other cars. But honestly, SSS does not really make this connection with planning and action, it really only gives you the categories of the segments. THe autonomous vehicle's next moves are still an open problem. What technologies do you expect that multi-model LLMs will supersede, if any? submitted by /u/moschles [link] [comments]  ( 9 min )
    [R] Three Ways to Generate AI Art Using Intel Arc GPUs
    submitted by /u/reps_up [link] [comments]  ( 9 min )
    [D] Using GANs to help understand latent representations of small dataset
    Hey. First of all I'm not a researcher on this area, so pardon my ignorance. I'm looking to employ a GAN on a dataset. The goals are still a bit unclear, but it's mainly to improve classification by either data augmentation and class balancing, or understanding the data through the latent representations. I'm really keen on InfoGAN at the moment. I trained one on the MNIST and the continuous variables learned the rotation and width, as in the paper and other peoples code. But at this point I think I need some help. I have labels, which means that maybe I should use a conditional GAN. But how will I learn similar representations as those in InfoGAN? I know StyleGAN is the current big thing in this area, but my images are limited to similar pixel-range as MNIST, and StyleGAN seems dependant on the ProGAN idea of increasing resolution for training. I'm a bit confused on the whole topic and would love a pointer to any discussion etc., as I can't seem to find anything but papers and they seem to be only focused on human faces, my data is unnatural not unsimilar again to MNIST. I don't have semantical information either as I see many papers employing that. I see many papers employing semi-supervision in this area, but honestly I'm just a bit lost and overwhelmed as this is not my area and GAN papers are still not stopping (I read a post here from 2017 about a GAN making GANs...). If you read this far: thank you and any pointers and discussion are very welcome. I would post on /r/learnmachinelearning but I feel the discussion in there is very different from this. My main goal is data exploration, but also to prove effectiveness some classification will be necessary and here the generative approach may help to augment data efficiently. submitted by /u/Infamous-Bank-7739 [link] [comments]  ( 10 min )
    [D] How do I change my domain (from DS to MLE)
    Hi guys need some advice, I have been working as a data scientist for the past 3 years, mostly in the domain of time series & predictive analytics, hence do not currently have any major NLP/CV/Deep learning projects as such. Now, I can see that NLP/CV/Gen AI is mostly in demand and they are really enjoyable as well. How do I shift into these domains, given that new companies having these roles are asking for similar past working experience? ​ submitted by /u/immortal_omen [link] [comments]  ( 9 min )
    [D] Ganimede, Jupyter Whiteboard
    I have been working on a alternative to Jupyter Notebooks. Please check it out and share your thoughts : https://github.com/nottherealsanta/ganimede ​ https://preview.redd.it/k8rcx8fwrjob1.png?width=2302&format=png&auto=webp&s=a8a670251f6c268acffc88a40bd528d8d438a5f5 submitted by /u/notsorealsanta [link] [comments]  ( 9 min )
    [D] How to Evaluate Spectrograms?
    How would you evaluate generated spectrogram audio quality? Taking Riffusion for example, how would you then compare its performance to another generator? What are some common techniques that I could use? I mean of course purely in the quality of the audio itself, not my subjective opinion on how much I like the music submitted by /u/DavesEmployee [link] [comments]  ( 9 min )
  • Open

    AI Apps Product Development Canvas – Part 2
    In part 1 of this series on the updated “AI Apps Development Canvas,” I introduced the updated AI Apps Product Development Design Canvas.  The AI Apps Product Development Canva is one of the capstone deliverables for my “Thinking Like a Data Scientist” methodology, so getting feedback is critical to ensure that the methodology is relevant… Read More »AI Apps Product Development Canvas – Part 2 The post AI Apps Product Development Canvas – Part 2 appeared first on Data Science Central.  ( 25 min )
  • Open

    Bard, describe what the pandemic and insurrection have made people feel.
    The COVID-19 pandemic and the January 6th insurrection have had a profound impact on people's feelings. Many people have experienced a range of negative emotions, including: Fear and anxiety: The pandemic has brought with it a constant fear of getting sick or losing loved ones. The insurrection also raised fears about the future of democracy and the safety of the country. Grief and loss: Millions of people have lost loved ones to COVID-19. Many others have experienced loss in other ways, such as losing their jobs, businesses, or homes. The insurrection also led to loss of trust in the government and in each other. Anger and frustration: Many people are angry about the way the pandemic and the insurrection have been handled. They may be frustrated with government officials, with their …  ( 10 min )
    Bard, what percent of our thoughts just come to us on their own without our consciously choosing them?
    There is no one definitive answer to this question, as the percentage of our thoughts that come to us on their own without our consciously choosing them can vary depending on the individual and the situation. However, some estimates suggest that up to 95% of our thoughts may be involuntary. This is because our thoughts are constantly being influenced by a variety of factors, both external and internal. For example, our environment, our emotions, and our physical state can all play a role in determining what thoughts pop into our heads. In addition, our brains are constantly processing information and making predictions, even when we are not actively thinking about anything. This process can sometimes lead to spontaneous thoughts, such as when we suddenly remember something that we had forgotten or when we come up with a new idea. While some of our spontaneous thoughts may be negative or intrusive, many of them can also be helpful or creative. For example, a spontaneous thought may help us to solve a problem, come up with a new solution, or simply make us smile. Overall, it is clear that a significant portion of our thoughts come to us without our consciously choosing them. This is a natural part of the human cognitive process and can be both beneficial and detrimental. submitted by /u/Georgeo57 [link] [comments]  ( 9 min )
    Are there any good ai video editing services available?
    I have a lighting business and I have LOADS of videos, time lapses, images from our projects. But I can't make decent videos with them for social media. I'd love to find an ai service where I can upload a projects media and prompt the ai with exactly what I want. Like, please use this content to create marketing videos for us on tiktok, Instagram and facebook Facebook Etc submitted by /u/RulerOfThePixel [link] [comments]  ( 9 min )
    How can I generate the missing part of the trick? Does this technique have a name?
    submitted by /u/farineziq [link] [comments]  ( 8 min )
    One-Minute Daily AI News 9/15/2023
    A little boy named Alex saw 17 different doctors over the course of three years, unable to find a root cause of his chronic pain. At her wit’s end, his mom, Courtney, fed his radiology report into ChatGPT and produced immediate answers.[1] In January, Wharton professor Christian Terwiesch gave his MBA final exam to ChatGPT. It passed with flying colors. Now, he’s at it again with a new experiment to determine whether ChatGPT can come up with product ideas better and faster than his students. It can. And cheaper, too.[2] Bathroom-cleaning robot built for commercial businesses gives consumers hope for AI maid.[3] Judge admits he used ChatGPT to write a Court of Appeal ruling as he calls the AI tool ‘jolly useful’.[4] Sources: [1] https://radiologybusiness.com/topics/artificial-intelligence/after-seeing-17-different-doctors-boy-rare-condition-receives-diagnosis-chatgpt [2] https://knowledge.wharton.upenn.edu/article/is-chatgpt-a-better-entrepreneur-than-most/ [3] https://www.foxnews.com/lifestyle/bathroom-cleaning-robot-built-commercial-businesses-gives-consumers-hope-ai-maid [4] https://www.dailymail.co.uk/news/article-12524607/Judge-ChatGPT-write-Court-Appeal-ruling-AI-useful.html submitted by /u/Excellent-Target-847 [link] [comments]  ( 9 min )
  • Open

    Hybrid Algorithm Selection and Hyperparameter Tuning on Distributed Machine Learning Resources: A Hierarchical Agent-based Approach. (arXiv:2309.06604v2 [cs.LG] UPDATED)
    Algorithm selection and hyperparameter tuning are critical steps in both academic and applied machine learning. On the other hand, these steps are becoming ever increasingly delicate due to the extensive rise in the number, diversity, and distributedness of machine learning resources. Multi-agent systems, when applied to the design of machine learning platforms, bring about several distinctive characteristics such as scalability, flexibility, and robustness, just to name a few. This paper proposes a fully automatic and collaborative agent-based mechanism for selecting distributedly organized machine learning algorithms and simultaneously tuning their hyperparameters. Our method builds upon an existing agent-based hierarchical machine-learning platform and augments its query structure to support the aforementioned functionalities without being limited to specific learning, selection, and tuning mechanisms. We have conducted theoretical assessments, formal verification, and analytical study to demonstrate the correctness, resource utilization, and computational efficiency of our technique. According to the results, our solution is totally correct and exhibits linear time and space complexity in relation to the size of available resources. To provide concrete examples of how the proposed methodologies can effectively adapt and perform across a range of algorithmic options and datasets, we have also conducted a series of experiments using a system comprised of 24 algorithms and 9 datasets.  ( 3 min )
    Nowhere coexpanding functions. (arXiv:2303.12814v3 [stat.ML] UPDATED)
    We define a family of $C^1$ functions which we call "nowhere coexpanding functions" that is closed under composition and includes all $C^3$ functions with non-positive Schwarzian derivative. We establish results on the number and nature of the fixed points of these functions, including a generalisation of a classic result of Singer.  ( 2 min )
    Uncertainty-aware Traffic Prediction under Missing Data. (arXiv:2309.06800v2 [cs.LG] UPDATED)
    Traffic prediction is a crucial topic because of its broad scope of applications in the transportation domain. Recently, various studies have achieved promising results. However, most studies assume the prediction locations have complete or at least partial historical records and cannot be extended to non-historical recorded locations. In real-life scenarios, the deployment of sensors could be limited due to budget limitations and installation availability, which makes most current models not applicable. Though few pieces of literature tried to impute traffic states at the missing locations, these methods need the data simultaneously observed at the locations with sensors, making them not applicable to prediction tasks. Another drawback is the lack of measurement of uncertainty in prediction, making prior works unsuitable for risk-sensitive tasks or involving decision-making. To fill the gap, inspired by the previous inductive graph neural network, this work proposed an uncertainty-aware framework with the ability to 1) extend prediction to missing locations with no historical records and significantly extend spatial coverage of prediction locations while reducing deployment of sensors and 2) generate probabilistic prediction with uncertainty quantification to help the management of risk and decision making in the down-stream tasks. Through extensive experiments on real-life datasets, the result shows our method achieved promising results on prediction tasks, and the uncertainty quantification gives consistent results which highly correlated with the locations with and without historical data. We also show that our model could help support sensor deployment tasks in the transportation field to achieve higher accuracy with a limited sensor deployment budget.  ( 3 min )
    Compressed Real Numbers for AI: a case-study using a RISC-V CPU. (arXiv:2309.07158v1 [cs.LG])
    As recently demonstrated, Deep Neural Networks (DNN), usually trained using single precision IEEE 754 floating point numbers (binary32), can also work using lower precision. Therefore, 16-bit and 8-bit compressed format have attracted considerable attention. In this paper, we focused on two families of formats that have already achieved interesting results in compressing binary32 numbers in machine learning applications, without sensible degradation of the accuracy: bfloat and posit. Even if 16-bit and 8-bit bfloat/posit are routinely used for reducing the storage of the weights/biases of trained DNNs, the inference still often happens on the 32-bit FPU of the CPU (especially if GPUs are not available). In this paper we propose a way to decompress a tensor of bfloat/posits just before computations, i.e., after the compressed operands have been loaded within the vector registers of a vector capable CPU, in order to save bandwidth usage and increase cache efficiency. Finally, we show the architectural parameters and considerations under which this solution is advantageous with respect to the uncompressed one.  ( 2 min )
    eDKM: An Efficient and Accurate Train-time Weight Clustering for Large Language Models. (arXiv:2309.00964v2 [cs.LG] UPDATED)
    Since Large Language Models or LLMs have demonstrated high-quality performance on many complex language tasks, there is a great interest in bringing these LLMs to mobile devices for faster responses and better privacy protection. However, the size of LLMs (i.e., billions of parameters) requires highly effective compression to fit into storage-limited devices. Among many compression techniques, weight-clustering, a form of non-linear quantization, is one of the leading candidates for LLM compression, and supported by modern smartphones. Yet, its training overhead is prohibitively significant for LLM fine-tuning. Especially, Differentiable KMeans Clustering, or DKM, has shown the state-of-the-art trade-off between compression ratio and accuracy regression, but its large memory complexity makes it nearly impossible to apply to train-time LLM compression. In this paper, we propose a memory-efficient DKM implementation, eDKM powered by novel techniques to reduce the memory footprint of DKM by orders of magnitudes. For a given tensor to be saved on CPU for the backward pass of DKM, we compressed the tensor by applying uniquification and sharding after checking if there is no duplicated tensor previously copied to CPU. Our experimental results demonstrate that \prjname can fine-tune and compress a pretrained LLaMA 7B model from 12.6 GB to 2.5 GB (3bit/weight) with the Alpaca dataset by reducing the train-time memory footprint of a decoder layer by 130$\times$, while delivering good accuracy on broader LLM benchmarks (i.e., 77.7% for PIQA, 66.1% for Winograde, and so on).  ( 3 min )
    Machine Learning-Assisted Discovery of Novel Reactor Designs. (arXiv:2308.08841v2 [cs.CE] UPDATED)
    Additive manufacturing has enabled the fabrication of advanced reactor geometries, permitting larger, more complex design spaces. Identifying promising configurations within such spaces presents a significant challenge for current approaches. Furthermore, existing parameterisations of reactor geometries are low-dimensional with expensive optimisation limiting more complex solutions. To address this challenge, we establish a machine learning-assisted approach for the design of the next-generation of chemical reactors, combining the application of high-dimensional parameterisations, computational fluid dynamics, and multi-fidelity Bayesian optimisation. We associate the development of mixing-enhancing vortical flow structures in novel coiled reactors with performance, and use our approach to identify key characteristics of optimal designs. By appealing to fluid mechanical principles, we rationalise the selection of novel design features that lead to experimental performance improvements of ~60% over conventional designs. Our results demonstrate that coupling advanced manufacturing techniques with `augmented-intelligence' approaches can lead to superior design performance and, consequently, emissions-reduction and sustainability.  ( 2 min )
    Transferable Graph Neural Fingerprint Models for Quick Response to Future Bio-Threats. (arXiv:2308.01921v2 [q-bio.BM] UPDATED)
    Fast screening of drug molecules based on the ligand binding affinity is an important step in the drug discovery pipeline. Graph neural fingerprint is a promising method for developing molecular docking surrogates with high throughput and great fidelity. In this study, we built a COVID-19 drug docking dataset of about 300,000 drug candidates on 23 coronavirus protein targets. With this dataset, we trained graph neural fingerprint docking models for high-throughput virtual COVID-19 drug screening. The graph neural fingerprint models yield high prediction accuracy on docking scores with the mean squared error lower than $0.21$ kcal/mol for most of the docking targets, showing significant improvement over conventional circular fingerprint methods. To make the neural fingerprints transferable for unknown targets, we also propose a transferable graph neural fingerprint method trained on multiple targets. With comparable accuracy to target-specific graph neural fingerprint models, the transferable model exhibits superb training and data efficiency. We highlight that the impact of this study extends beyond COVID-19 dataset, as our approach for fast virtual ligand screening can be easily adapted and integrated into a general machine learning-accelerated pipeline to battle future bio-threats.  ( 3 min )
    Semantic Adversarial Attacks via Diffusion Models. (arXiv:2309.07398v1 [cs.CV])
    Traditional adversarial attacks concentrate on manipulating clean examples in the pixel space by adding adversarial perturbations. By contrast, semantic adversarial attacks focus on changing semantic attributes of clean examples, such as color, context, and features, which are more feasible in the real world. In this paper, we propose a framework to quickly generate a semantic adversarial attack by leveraging recent diffusion models since semantic information is included in the latent space of well-trained diffusion models. Then there are two variants of this framework: 1) the Semantic Transformation (ST) approach fine-tunes the latent space of the generated image and/or the diffusion model itself; 2) the Latent Masking (LM) approach masks the latent space with another target image and local backpropagation-based interpretation methods. Additionally, the ST approach can be applied in either white-box or black-box settings. Extensive experiments are conducted on CelebA-HQ and AFHQ datasets, and our framework demonstrates great fidelity, generalizability, and transferability compared to other baselines. Our approaches achieve approximately 100% attack success rate in multiple settings with the best FID as 36.61. Code is available at https://github.com/steven202/semantic_adv_via_dm.  ( 2 min )
    Meta-Learning Regrasping Strategies for Physical-Agnostic Objects. (arXiv:2205.11110v2 [cs.RO] UPDATED)
    Grasping inhomogeneous objects in real-world applications remains a challenging task due to the unknown physical properties such as mass distribution and coefficient of friction. In this study, we propose a meta-learning algorithm called ConDex, which incorporates Conditional Neural Processes (CNP) with DexNet-2.0 to autonomously discern the underlying physical properties of objects using depth images. ConDex efficiently acquires physical embeddings from limited trials, enabling precise grasping point estimation. Furthermore, ConDex is capable of updating the predicted grasping quality iteratively from new trials in an online fashion. To the best of our knowledge, we are the first who generate two object datasets focusing on inhomogeneous physical properties with varying mass distributions and friction coefficients. Extensive evaluations in simulation demonstrate ConDex's superior performance over DexNet-2.0 and existing meta-learning-based grasping pipelines. Furthermore, ConDex shows robust generalization to previously unseen real-world objects despite training solely in the simulation. The synthetic and real-world datasets will be published as well.  ( 2 min )
    An Adaptive Federated Relevance Framework for Spatial Temporal Graph Learning. (arXiv:2206.03420v3 [cs.LG] UPDATED)
    Spatial-temporal data contains rich information and has been widely studied in recent years due to the rapid development of relevant applications in many fields. For instance, medical institutions often use electrodes attached to different parts of a patient to analyse the electorencephal data rich with spatial and temporal features for health assessment and disease diagnosis. Existing research has mainly used deep learning techniques such as convolutional neural network (CNN) or recurrent neural network (RNN) to extract hidden spatial-temporal features. Yet, it is challenging to incorporate both inter-dependencies spatial information and dynamic temporal changes simultaneously. In reality, for a model that leverages these spatial-temporal features to fulfil complex prediction tasks, it often requires a colossal amount of training data in order to obtain satisfactory model performance. Considering the above-mentioned challenges, we propose an adaptive federated relevance framework, namely FedRel, for spatial-temporal graph learning in this paper. After transforming the raw spatial-temporal data into high quality features, the core Dynamic Inter-Intra Graph (DIIG) module in the framework is able to use these features to generate the spatial-temporal graphs capable of capturing the hidden topological and long-term temporal correlation information in these graphs. To improve the model generalization ability and performance while preserving the local data privacy, we also design a relevance-driven federated learning module in our framework to leverage diverse data distributions from different participants with attentive aggregations of their models.
    Self-optimizing Feature Generation via Categorical Hashing Representation and Hierarchical Reinforcement Crossing. (arXiv:2309.04612v2 [cs.LG] UPDATED)
    Feature generation aims to generate new and meaningful features to create a discriminative representation space.A generated feature is meaningful when the generated feature is from a feature pair with inherent feature interaction. In the real world, experienced data scientists can identify potentially useful feature-feature interactions, and generate meaningful dimensions from an exponentially large search space, in an optimal crossing form over an optimal generation path. But, machines have limited human-like abilities.We generalize such learning tasks as self-optimizing feature generation. Self-optimizing feature generation imposes several under-addressed challenges on existing systems: meaningful, robust, and efficient generation. To tackle these challenges, we propose a principled and generic representation-crossing framework to solve self-optimizing feature generation.To achieve hashing representation, we propose a three-step approach: feature discretization, feature hashing, and descriptive summarization. To achieve reinforcement crossing, we develop a hierarchical reinforcement feature crossing approach.We present extensive experimental results to demonstrate the effectiveness and efficiency of the proposed method. The code is available at https://github.com/yingwangyang/HRC_feature_cross.git.  ( 2 min )
    What Matters to Enhance Traffic Rule Compliance of Imitation Learning for Automated Driving. (arXiv:2309.07808v1 [cs.CV])
    More research attention has recently been given to end-to-end autonomous driving technologies where the entire driving pipeline is replaced with a single neural network because of its simpler structure and faster inference time. Despite this appealing approach largely reducing the components in driving pipeline, its simplicity also leads to interpretability problems and safety issues arXiv:2003.06404. The trained policy is not always compliant with the traffic rules and it is also hard to discover the reason for the misbehavior because of the lack of intermediate outputs. Meanwhile, Sensors are also critical to autonomous driving's security and feasibility to perceive the surrounding environment under complex driving scenarios. In this paper, we proposed P-CSG, a novel penalty-based imitation learning approach with cross semantics generation sensor fusion technologies to increase the overall performance of End-to-End Autonomous Driving. We conducted an assessment of our model's performance using the Town 05 Long benchmark, achieving an impressive driving score improvement of over 15%. Furthermore, we conducted robustness evaluations against adversarial attacks like FGSM and Dot attacks, revealing a substantial increase in robustness compared to baseline models.More detailed information, such as code-based resources, ablation studies and videos can be found at https://hk-zh.github.io/p-csg-plus.  ( 2 min )
    Directed Scattering for Knowledge Graph-based Cellular Signaling Analysis. (arXiv:2309.07813v1 [cs.LG])
    Directed graphs are a natural model for many phenomena, in particular scientific knowledge graphs such as molecular interaction or chemical reaction networks that define cellular signaling relationships. In these situations, source nodes typically have distinct biophysical properties from sinks. Due to their ordered and unidirectional relationships, many such networks also have hierarchical and multiscale structure. However, the majority of methods performing node- and edge-level tasks in machine learning do not take these properties into account, and thus have not been leveraged effectively for scientific tasks such as cellular signaling network inference. We propose a new framework called Directed Scattering Autoencoder (DSAE) which uses a directed version of a geometric scattering transform, combined with the non-linear dimensionality reduction properties of an autoencoder and the geometric properties of the hyperbolic space to learn latent hierarchies. We show this method outperforms numerous others on tasks such as embedding directed graphs and learning cellular signaling networks.  ( 2 min )
    Reliability-based cleaning of noisy training labels with inductive conformal prediction in multi-modal biomedical data mining. (arXiv:2309.07332v1 [cs.LG])
    Accurately labeling biomedical data presents a challenge. Traditional semi-supervised learning methods often under-utilize available unlabeled data. To address this, we propose a novel reliability-based training data cleaning method employing inductive conformal prediction (ICP). This method capitalizes on a small set of accurately labeled training data and leverages ICP-calculated reliability metrics to rectify mislabeled data and outliers within vast quantities of noisy training data. The efficacy of the method is validated across three classification tasks within distinct modalities: filtering drug-induced-liver-injury (DILI) literature with title and abstract, predicting ICU admission of COVID-19 patients through CT radiomics and electronic health records, and subtyping breast cancer using RNA-sequencing data. Varying levels of noise to the training labels were introduced through label permutation. Results show significant enhancements in classification performance: accuracy enhancement in 86 out of 96 DILI experiments (up to 11.4%), AUROC and AUPRC enhancements in all 48 COVID-19 experiments (up to 23.8% and 69.8%), and accuracy and macro-average F1 score improvements in 47 out of 48 RNA-sequencing experiments (up to 74.6% and 89.0%). Our method offers the potential to substantially boost classification performance in multi-modal biomedical machine learning tasks. Importantly, it accomplishes this without necessitating an excessive volume of meticulously curated training data.  ( 3 min )
    Improving Multimodal Classification of Social Media Posts by Leveraging Image-Text Auxiliary tasks. (arXiv:2309.07794v1 [cs.CL])
    Effectively leveraging multimodal information from social media posts is essential to various downstream tasks such as sentiment analysis, sarcasm detection and hate speech classification. However, combining text and image information is challenging because of the idiosyncratic cross-modal semantics with hidden or complementary information present in matching image-text pairs. In this work, we aim to directly model this by proposing the use of two auxiliary losses jointly with the main task when fine-tuning any pre-trained multimodal model. Image-Text Contrastive (ITC) brings image-text representations of a post closer together and separates them from different posts, capturing underlying dependencies. Image-Text Matching (ITM) facilitates the understanding of semantic correspondence between images and text by penalizing unrelated pairs. We combine these objectives with five multimodal models, demonstrating consistent improvements across four popular social media datasets. Furthermore, through detailed analysis, we shed light on the specific scenarios and cases where each auxiliary task proves to be most effective.  ( 2 min )
    Benchmarking Jetson Edge Devices with an End-to-end Video-based Anomaly Detection System. (arXiv:2307.16834v3 [cs.CV] UPDATED)
    Innovative enhancement in embedded system platforms, specifically hardware accelerations, significantly influence the application of deep learning in real-world scenarios. These innovations translate human labor efforts into automated intelligent systems employed in various areas such as autonomous driving, robotics, Internet-of-Things (IoT), and numerous other impactful applications. NVIDIA's Jetson platform is one of the pioneers in offering optimal performance regarding energy efficiency and throughput in the execution of deep learning algorithms. Previously, most benchmarking analysis was based on 2D images with a single deep learning model for each comparison result. In this paper, we implement an end-to-end video-based crime-scene anomaly detection system inputting from surveillance videos and the system is deployed and completely operates on multiple Jetson edge devices (Nano, AGX Xavier, Orin Nano). The comparison analysis includes the integration of Torch-TensorRT as a software developer kit from NVIDIA for the model performance optimisation. The system is built based on the PySlowfast open-source project from Facebook as the coding template. The end-to-end system process comprises the videos from camera, data preprocessing pipeline, feature extractor and the anomaly detection. We provide the experience of an AI-based system deployment on various Jetson Edge devices with Docker technology. Regarding anomaly detectors, a weakly supervised video-based deep learning model called Robust Temporal Feature Magnitude Learning (RTFM) is applied in the system. The approach system reaches 47.56 frames per second (FPS) inference speed on a Jetson edge device with only 3.11 GB RAM usage total. We also discover the promising Jetson device that the AI system achieves 15% better performance than the previous version of Jetson devices while consuming 50% less energy power.  ( 3 min )
    Multi-step prediction of chlorophyll concentration based on Adaptive Graph-Temporal Convolutional Network with Series Decomposition. (arXiv:2309.07187v1 [cs.LG])
    Chlorophyll concentration can well reflect the nutritional status and algal blooms of water bodies, and is an important indicator for evaluating water quality. The prediction of chlorophyll concentration change trend is of great significance to environmental protection and aquaculture. However, there is a complex and indistinguishable nonlinear relationship between many factors affecting chlorophyll concentration. In order to effectively mine the nonlinear features contained in the data. This paper proposes a time-series decomposition adaptive graph-time convolutional network ( AGTCNSD ) prediction model. Firstly, the original sequence is decomposed into trend component and periodic component by moving average method. Secondly, based on the graph convolutional neural network, the water quality parameter data is modeled, and a parameter embedding matrix is defined. The idea of matrix decomposition is used to assign weight parameters to each node. The adaptive graph convolution learns the relationship between different water quality parameters, updates the state information of each parameter, and improves the learning ability of the update relationship between nodes. Finally, time dependence is captured by time convolution to achieve multi-step prediction of chlorophyll concentration. The validity of the model is verified by the water quality data of the coastal city Beihai. The results show that the prediction effect of this method is better than other methods. It can be used as a scientific resource for environmental management decision-making.  ( 3 min )
    Reasoning with Language Model Prompting: A Survey. (arXiv:2212.09597v7 [cs.CL] UPDATED)
    Reasoning, as an essential ability for complex problem-solving, can provide back-end support for various real-world applications, such as medical diagnosis, negotiation, etc. This paper provides a comprehensive survey of cutting-edge research on reasoning with language model prompting. We introduce research works with comparisons and summaries and provide systematic resources to help beginners. We also discuss the potential reasons for emerging such reasoning abilities and highlight future research directions. Resources are available at https://github.com/zjunlp/Prompt4ReasoningPapers (updated periodically).  ( 2 min )
    Detecting Misinformation with LLM-Predicted Credibility Signals and Weak Supervision. (arXiv:2309.07601v1 [cs.CL])
    Credibility signals represent a wide range of heuristics that are typically used by journalists and fact-checkers to assess the veracity of online content. Automating the task of credibility signal extraction, however, is very challenging as it requires high-accuracy signal-specific extractors to be trained, while there are currently no sufficiently large datasets annotated with all credibility signals. This paper investigates whether large language models (LLMs) can be prompted effectively with a set of 18 credibility signals to produce weak labels for each signal. We then aggregate these potentially noisy labels using weak supervision in order to predict content veracity. We demonstrate that our approach, which combines zero-shot LLM credibility signal labeling and weak supervision, outperforms state-of-the-art classifiers on two misinformation datasets without using any ground-truth labels for training. We also analyse the contribution of the individual credibility signals towards predicting content veracity, which provides new valuable insights into their role in misinformation detection.  ( 2 min )
    Text Classification of Cancer Clinical Trial Eligibility Criteria. (arXiv:2309.07812v1 [cs.CL])
    Automatic identification of clinical trials for which a patient is eligible is complicated by the fact that trial eligibility is stated in natural language. A potential solution to this problem is to employ text classification methods for common types of eligibility criteria. In this study, we focus on seven common exclusion criteria in cancer trials: prior malignancy, human immunodeficiency virus, hepatitis B, hepatitis C, psychiatric illness, drug/substance abuse, and autoimmune illness. Our dataset consists of 764 phase III cancer trials with these exclusions annotated at the trial level. We experiment with common transformer models as well as a new pre-trained clinical trial BERT model. Our results demonstrate the feasibility of automatically classifying common exclusion criteria. Additionally, we demonstrate the value of a pre-trained language model specifically for clinical trials, which yields the highest average performance across all criteria.  ( 2 min )
    Understanding Vector-Valued Neural Networks and Their Relationship with Real and Hypercomplex-Valued Neural Networks. (arXiv:2309.07716v1 [cs.LG])
    Despite the many successful applications of deep learning models for multidimensional signal and image processing, most traditional neural networks process data represented by (multidimensional) arrays of real numbers. The intercorrelation between feature channels is usually expected to be learned from the training data, requiring numerous parameters and careful training. In contrast, vector-valued neural networks are conceived to process arrays of vectors and naturally consider the intercorrelation between feature channels. Consequently, they usually have fewer parameters and often undergo more robust training than traditional neural networks. This paper aims to present a broad framework for vector-valued neural networks, referred to as V-nets. In this context, hypercomplex-valued neural networks are regarded as vector-valued models with additional algebraic properties. Furthermore, this paper explains the relationship between vector-valued and traditional neural networks. Precisely, a vector-valued neural network can be obtained by placing restrictions on a real-valued model to consider the intercorrelation between feature channels. Finally, we show how V-nets, including hypercomplex-valued neural networks, can be implemented in current deep-learning libraries as real-valued networks.  ( 2 min )
    Entropy-based machine learning model for diagnosis and monitoring of Parkinson's Disease in smart IoT environment. (arXiv:2309.07134v1 [eess.SP])
    The study presents the concept of a computationally efficient machine learning (ML) model for diagnosing and monitoring Parkinson's disease (PD) in an Internet of Things (IoT) environment using rest-state EEG signals (rs-EEG). We computed different types of entropy from EEG signals and found that Fuzzy Entropy performed the best in diagnosing and monitoring PD using rs-EEG. We also investigated different combinations of signal frequency ranges and EEG channels to accurately diagnose PD. Finally, with a fewer number of features (11 features), we achieved a maximum classification accuracy (ARKF) of ~99.9%. The most prominent frequency range of EEG signals has been identified, and we have found that high classification accuracy depends on low-frequency signal components (0-4 Hz). Moreover, the most informative signals were mainly received from the right hemisphere of the head (F8, P8, T8, FC6). Furthermore, we assessed the accuracy of the diagnosis of PD using three different lengths of EEG data (150-1000 samples). Because the computational complexity is reduced by reducing the input data. As a result, we have achieved a maximum mean accuracy of 99.9% for a sample length (LEEG) of 1000 (~7.8 seconds), 98.2% with a LEEG of 800 (~6.2 seconds), and 79.3% for LEEG = 150 (~1.2 seconds). By reducing the number of features and segment lengths, the computational cost of classification can be reduced. Lower-performance smart ML sensors can be used in IoT environments for enhances human resilience to PD.  ( 3 min )
    Random Feature Amplification: Feature Learning and Generalization in Neural Networks. (arXiv:2202.07626v4 [cs.LG] UPDATED)
    In this work, we provide a characterization of the feature-learning process in two-layer ReLU networks trained by gradient descent on the logistic loss following random initialization. We consider data with binary labels that are generated by an XOR-like function of the input features. We permit a constant fraction of the training labels to be corrupted by an adversary. We show that, although linear classifiers are no better than random guessing for the distribution we consider, two-layer ReLU networks trained by gradient descent achieve generalization error close to the label noise rate. We develop a novel proof technique that shows that at initialization, the vast majority of neurons function as random features that are only weakly correlated with useful features, and the gradient descent dynamics 'amplify' these weak, random features to strong, useful features.  ( 2 min )
    deepFDEnet: A Novel Neural Network Architecture for Solving Fractional Differential Equations. (arXiv:2309.07684v1 [cs.LG])
    The primary goal of this research is to propose a novel architecture for a deep neural network that can solve fractional differential equations accurately. A Gaussian integration rule and a $L_1$ discretization technique are used in the proposed design. In each equation, a deep neural network is used to approximate the unknown function. Three forms of fractional differential equations have been examined to highlight the method's versatility: a fractional ordinary differential equation, a fractional order integrodifferential equation, and a fractional order partial differential equation. The results show that the proposed architecture solves different forms of fractional differential equations with excellent precision.  ( 2 min )
    Interpretable Weighted Siamese Network to Predict the Time to Onset of Alzheimer's Disease from MRI Images. (arXiv:2304.07097v2 [eess.IV] UPDATED)
    Alzheimer's Disease (AD) is a progressive disease preceded by Mild Cognitive Impairment (MCI). Early detection of AD is crucial for making treatment decisions. However, most of the literature on computer-assisted detection of AD focuses on classifying brain images into one of three major categories: healthy, MCI, and AD; or categorizing MCI patients into (1) progressive: those who progress from MCI to AD at a future examination time, and (2) stable: those who stay as MCI and never progress to AD. This misses the opportunity to accurately identify the trajectory of progressive MCI patients. In this paper, we revisit the brain image classification task for AD identification and re-frame it as an ordinal classification task to predict how close a patient is to the severe AD stage. To this end, we select progressive MCI patients from the Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset and construct an ordinal dataset with a prediction target that indicates the time to progression to AD. We train a Siamese network model to predict the time to onset of AD based on MRI brain images. We also propose a Weighted variety of Siamese network and compare its performance to a baseline model. Our evaluations show that incorporating a weighting factor to Siamese networks brings considerable performance gain at predicting how close input brain MRI images are to progressing to AD. Moreover, we complement our results with an interpretation of the learned embedding space of the Siamese networks using a model explainability technique.  ( 3 min )
    Reward is not Necessary: How to Create a Compositional Self-Preserving Agent for Life-Long Learning. (arXiv:2211.10851v3 [cs.AI] UPDATED)
    Reinforcement Learning views the maximization of rewards and avoidance of punishments as central to explaining goal-directed behavior. However, over a life, organisms will need to learn about many different aspects of the world's structure: the states of the world and state-vector transition dynamics. The number of combinations of states grows exponentially as an agent incorporates new knowledge, and there is no obvious weighted combination of pre-existing rewards or costs defined for a given combination of states, as such a weighting would need to encode information about good and bad combinations prior to an agent's experience in the world. Therefore, we must develop more naturalistic accounts of behavior and motivation in large state-spaces. We show that it is possible to use only the intrinsic motivation metric of empowerment, which measures the agent's capacity to realize many possible futures under a transition operator. We propose to scale empowerment to hierarchical state-spaces by using Operator Bellman Equations. These equations produce state-time feasibility functions, which are compositional hierarchical state-time transition operators that map an initial state and time when an agent begins a policy to the final states and times of completing a goal. Because these functions are hierarchical operators we can define hierarchical empowerment measures on them. An agent can then optimize plans to distant states and times to maximize its hierarchical empowerment-gain, allowing it to discover goals that bring about a more favorable coupling of its internal structure (physiological states) to its external environment (world structure & spatial state). Life-long agents could therefore be primarily animated by principles of compositionality and empowerment, exhibiting self-concern for the growth & maintenance of their own structural integrity without recourse to reward-maximization.  ( 3 min )
    Masked Transformer for Electrocardiogram Classification. (arXiv:2309.07136v1 [eess.SP])
    Electrocardiogram (ECG) is one of the most important diagnostic tools in clinical applications. With the advent of advanced algorithms, various deep learning models have been adopted for ECG tasks. However, the potential of Transformers for ECG data is not yet realized, despite their widespread success in computer vision and natural language processing. In this work, we present a useful masked Transformer method for ECG classification referred to as MTECG, which expands the application of masked autoencoders to ECG time series. We construct a dataset comprising 220,251 ECG recordings with a broad range of diagnoses annoated by medical experts to explore the properties of MTECG. Under the proposed training strategies, a lightweight model with 5.7M parameters performs stably well on a broad range of masking ratios (5%-75%). The ablation studies highlight the importance of fluctuated reconstruction targets, training schedule length, layer-wise LR decay and DropPath rate. The experiments on both private and public ECG datasets demonstrate that MTECG-T significantly outperforms the recent state-of-the-art algorithms in ECG classification.  ( 2 min )
    Structure-Preserving Transformers for Sequences of SPD Matrices. (arXiv:2309.07579v1 [cs.LG])
    In recent years, Transformer-based auto-attention mechanisms have been successfully applied to the analysis of a variety of context-reliant data types, from texts to images and beyond, including data from non-Euclidean geometries. In this paper, we present such a mechanism, designed to classify sequences of Symmetric Positive Definite matrices while preserving their Riemannian geometry throughout the analysis. We apply our method to automatic sleep staging on timeseries of EEG-derived covariance matrices from a standard dataset, obtaining high levels of stage-wise performance.
    A Novel Local-Global Feature Fusion Framework for Body-weight Exercise Recognition with Pressure Mapping Sensors. (arXiv:2309.07888v1 [cs.CV])
    We present a novel local-global feature fusion framework for body-weight exercise recognition with floor-based dynamic pressure maps. One step further from the existing studies using deep neural networks mainly focusing on global feature extraction, the proposed framework aims to combine local and global features using image processing techniques and the YOLO object detection to localize pressure profiles from different body parts and consider physical constraints. The proposed local feature extraction method generates two sets of high-level local features consisting of cropped pressure mapping and numerical features such as angular orientation, location on the mat, and pressure area. In addition, we adopt a knowledge distillation for regularization to preserve the knowledge of the global feature extraction and improve the performance of the exercise recognition. Our experimental results demonstrate a notable 11 percent improvement in F1 score for exercise recognition while preserving label-specific features.  ( 2 min )
    Deep reinforced learning heuristic tested on spin-glass ground states: The larger picture. (arXiv:2302.10848v2 [cond-mat.dis-nn] UPDATED)
    In Changjun Fan et al. [Nature Communications https://doi.org/10.1038/s41467-023-36363-w (2023)], the authors present a deep reinforced learning approach to augment combinatorial optimization heuristics. In particular, they present results for several spin glass ground state problems, for which instances on non-planar networks are generally NP-hard, in comparison with several Monte Carlo based methods, such as simulated annealing (SA) or parallel tempering (PT). Indeed, those results demonstrate that the reinforced learning improves the results over those obtained with SA or PT, or at least allows for reduced runtimes for the heuristics before results of comparable quality have been obtained relative to those other methods. To facilitate the conclusion that their method is ''superior'', the authors pursue two basic strategies: (1) A commercial GUROBI solver is called on to procure a sample of exact ground states as a testbed to compare with, and (2) a head-to-head comparison between the heuristics is given for a sample of larger instances where exact ground states are hard to ascertain. Here, we put these studies into a larger context, showing that the claimed superiority is at best marginal for smaller samples and becomes essentially irrelevant with respect to any sensible approximation of true ground states in the larger samples. For example, this method becomes irrelevant as a means to determine stiffness exponents $\theta$ in $d>2$, as mentioned by the authors, where the problem is not only NP-hard but requires the subtraction of two almost equal ground-state energies and systemic errors in each of $\approx 1\%$ found here are unacceptable. This larger picture on the method arises from a straightforward finite-size corrections study over the spin glass ensembles the authors employ, using data that has been available for decades.
    DoRA: Domain-Based Self-Supervised Learning Framework for Low-Resource Real Estate Appraisal. (arXiv:2309.00855v3 [cs.LG] UPDATED)
    The marketplace system connecting demands and supplies has been explored to develop unbiased decision-making in valuing properties. Real estate appraisal serves as one of the high-cost property valuation tasks for financial institutions since it requires domain experts to appraise the estimation based on the corresponding knowledge and the judgment of the market. Existing automated valuation models reducing the subjectivity of domain experts require a large number of transactions for effective evaluation, which is predominantly limited to not only the labeling efforts of transactions but also the generalizability of new developing and rural areas. To learn representations from unlabeled real estate sets, existing self-supervised learning (SSL) for tabular data neglects various important features, and fails to incorporate domain knowledge. In this paper, we propose DoRA, a Domain-based self-supervised learning framework for low-resource Real estate Appraisal. DoRA is pre-trained with an intra-sample geographic prediction as the pretext task based on the metadata of the real estate for equipping the real estate representations with prior domain knowledge. Furthermore, inter-sample contrastive learning is employed to generalize the representations to be robust for limited transactions of downstream tasks. Our benchmark results on three property types of real-world transactions show that DoRA significantly outperforms the SSL baselines for tabular data, the graph-based methods, and the supervised approaches in the few-shot scenarios by at least 7.6% for MAPE, 11.59% for MAE, and 3.34% for HR10%. We expect DoRA to be useful to other financial practitioners with similar marketplace applications who need general models for properties that are newly built and have limited records. The source code is available at https://github.com/wwweiwei/DoRA.
    Evaluation of Parameter-based Attacks against Embedded Neural Networks with Laser Injection. (arXiv:2304.12876v2 [cs.CR] UPDATED)
    Upcoming certification actions related to the security of machine learning (ML) based systems raise major evaluation challenges that are amplified by the large-scale deployment of models in many hardware platforms. Until recently, most of research works focused on API-based attacks that consider a ML model as a pure algorithmic abstraction. However, new implementation-based threats have been revealed, emphasizing the urgency to propose both practical and simulation-based methods to properly evaluate the robustness of models. A major concern is parameter-based attacks (such as the Bit-Flip Attack, BFA) that highlight the lack of robustness of typical deep neural network models when confronted by accurate and optimal alterations of their internal parameters stored in memory. Setting in a security testing purpose, this work practically reports, for the first time, a successful variant of the BFA on a 32-bit Cortex-M microcontroller using laser fault injection. It is a standard fault injection means for security evaluation, that enables to inject spatially and temporally accurate faults. To avoid unrealistic brute-force strategies, we show how simulations help selecting the most sensitive set of bits from the parameters taking into account the laser fault model.
    Learning nonparametric DAGs with incremental information via high-order HSIC. (arXiv:2308.05969v2 [cs.LG] UPDATED)
    Score-based methods for learning Bayesain networks(BN) aim to maximizing the global score functions. However, if local variables have direct and indirect dependence simultaneously, the global optimization on score functions misses edges between variables with indirect dependent relationship, of which scores are smaller than those with direct dependent relationship. In this paper, we present an identifiability condition based on a determined subset of parents to identify the underlying DAG. By the identifiability condition, we develop a two-phase algorithm namely optimal-tuning (OT) algorithm to locally amend the global optimization. In the optimal phase, an optimization problem based on first-order Hilbert-Schmidt independence criterion (HSIC) gives an estimated skeleton as the initial determined parents subset. In the tuning phase, the skeleton is locally tuned by deletion, addition and DAG-formalization strategies using the theoretically proved incremental properties of high-order HSIC. Numerical experiments for different synthetic datasets and real-world datasets show that the OT algorithm outperforms existing methods. Especially in Sigmoid Mix model with the size of the graph being ${\rm\bf d=40}$, the structure intervention distance (SID) of the OT algorithm is 329.7 smaller than the one obtained by CAM, which indicates that the graph estimated by the OT algorithm misses fewer edges compared with CAM.Source code of the OT algorithm is available at https://github.com/YafeiannWang/optimal-tune-algorithm.
    TensorFlow Chaotic Prediction and Blow Up. (arXiv:2309.07450v1 [cs.LG])
    Predicting the dynamics of chaotic systems is one of the most challenging tasks for neural networks, and machine learning in general. Here we aim to predict the spatiotemporal chaotic dynamics of a high-dimensional non-linear system. In our attempt we use the TensorFlow library, representing the state of the art for deep neural networks training and prediction. While our results are encouraging, and show that the dynamics of the considered system can be predicted for short time, we also indirectly discovered an unexpected and undesirable behavior of the TensorFlow library. More specifically, the longer term prediction of the system's chaotic behavior quickly deteriorates and blows up due to the nondeterministic behavior of the TensorFlow library. Here we provide numerical evidence of the short time prediction ability, and of the longer term predictability blow up.
    Deep Nonparametric Convexified Filtering for Computational Photography, Image Synthesis and Adversarial Defense. (arXiv:2309.06724v2 [cs.CV] UPDATED)
    We aim to provide a general framework of for computational photography that recovers the real scene from imperfect images, via the Deep Nonparametric Convexified Filtering (DNCF). It is consists of a nonparametric deep network to resemble the physical equations behind the image formation, such as denoising, super-resolution, inpainting, and flash. DNCF has no parameterization dependent on training data, therefore has a strong generalization and robustness to adversarial image manipulation. During inference, we also encourage the network parameters to be nonnegative and create a bi-convex function on the input and parameters, and this adapts to second-order optimization algorithms with insufficient running time, having 10X acceleration over Deep Image Prior. With these tools, we empirically verify its capability to defend image classification deep networks against adversary attack algorithms in real-time.
    Autotuning Apache TVM-based Scientific Applications Using Bayesian Optimization. (arXiv:2309.07235v1 [cs.LG])
    Apache TVM (Tensor Virtual Machine), an open source machine learning compiler framework designed to optimize computations across various hardware platforms, provides an opportunity to improve the performance of dense matrix factorizations such as LU (Lower Upper) decomposition and Cholesky decomposition on GPUs and AI (Artificial Intelligence) accelerators. In this paper, we propose a new TVM autotuning framework using Bayesian Optimization and use the TVM tensor expression language to implement linear algebra kernels such as LU, Cholesky, and 3mm. We use these scientific computation kernels to evaluate the effectiveness of our methods on a GPU cluster, called Swing, at Argonne National Laboratory. We compare the proposed autotuning framework with the TVM autotuning framework AutoTVM with four tuners and find that our framework outperforms AutoTVM in most cases.
    Interpretability is in the Mind of the Beholder: A Causal Framework for Human-interpretable Representation Learning. (arXiv:2309.07742v1 [cs.LG])
    Focus in Explainable AI is shifting from explanations defined in terms of low-level elements, such as input features, to explanations encoded in terms of interpretable concepts learned from data. How to reliably acquire such concepts is, however, still fundamentally unclear. An agreed-upon notion of concept interpretability is missing, with the result that concepts used by both post-hoc explainers and concept-based neural networks are acquired through a variety of mutually incompatible strategies. Critically, most of these neglect the human side of the problem: a representation is understandable only insofar as it can be understood by the human at the receiving end. The key challenge in Human-interpretable Representation Learning (HRL) is how to model and operationalize this human element. In this work, we propose a mathematical framework for acquiring interpretable representations suitable for both post-hoc explainers and concept-based neural networks. Our formalization of HRL builds on recent advances in causal representation learning and explicitly models a human stakeholder as an external observer. This allows us to derive a principled notion of alignment between the machine representation and the vocabulary of concepts understood by the human. In doing so, we link alignment and interpretability through a simple and intuitive name transfer game, and clarify the relationship between alignment and a well-known property of representations, namely disentanglment. We also show that alignment is linked to the issue of undesirable correlations among concepts, also known as concept leakage, and to content-style separation, all through a general information-theoretic reformulation of these properties. Our conceptualization aims to bridge the gap between the human and algorithmic sides of interpretability and establish a stepping stone for new research on human-interpretable representations.
    Learning to Warm-Start Fixed-Point Optimization Algorithms. (arXiv:2309.07835v1 [math.OC])
    We introduce a machine-learning framework to warm-start fixed-point optimization algorithms. Our architecture consists of a neural network mapping problem parameters to warm starts, followed by a predefined number of fixed-point iterations. We propose two loss functions designed to either minimize the fixed-point residual or the distance to a ground truth solution. In this way, the neural network predicts warm starts with the end-to-end goal of minimizing the downstream loss. An important feature of our architecture is its flexibility, in that it can predict a warm start for fixed-point algorithms run for any number of steps, without being limited to the number of steps it has been trained on. We provide PAC-Bayes generalization bounds on unseen data for common classes of fixed-point operators: contractive, linearly convergent, and averaged. Applying this framework to well-known applications in control, statistics, and signal processing, we observe a significant reduction in the number of iterations and solution time required to solve these problems, through learned warm starts.
    On Gradient Descent Ascent for Nonconvex-Concave Minimax Problems. (arXiv:1906.00331v9 [cs.LG] UPDATED)
    We consider nonconvex-concave minimax problems, $\min_{\mathbf{x}} \max_{\mathbf{y} \in \mathcal{Y}} f(\mathbf{x}, \mathbf{y})$, where $f$ is nonconvex in $\mathbf{x}$ but concave in $\mathbf{y}$ and $\mathcal{Y}$ is a convex and bounded set. One of the most popular algorithms for solving this problem is the celebrated gradient descent ascent (GDA) algorithm, which has been widely used in machine learning, control theory and economics. Despite the extensive convergence results for the convex-concave setting, GDA with equal stepsize can converge to limit cycles or even diverge in a general setting. In this paper, we present the complexity results on two-time-scale GDA for solving nonconvex-concave minimax problems, showing that the algorithm can find a stationary point of the function $\Phi(\cdot) := \max_{\mathbf{y} \in \mathcal{Y}} f(\cdot, \mathbf{y})$ efficiently. To the best our knowledge, this is the first nonasymptotic analysis for two-time-scale GDA in this setting, shedding light on its superior practical performance in training generative adversarial networks (GANs) and other real applications.
    Causal Entropy and Information Gain for Measuring Causal Control. (arXiv:2309.07703v1 [cs.LG])
    Artificial intelligence models and methods commonly lack causal interpretability. Despite the advancements in interpretable machine learning (IML) methods, they frequently assign importance to features which lack causal influence on the outcome variable. Selecting causally relevant features among those identified as relevant by these methods, or even before model training, would offer a solution. Feature selection methods utilizing information theoretical quantities have been successful in identifying statistically relevant features. However, the information theoretical quantities they are based on do not incorporate causality, rendering them unsuitable for such scenarios. To address this challenge, this article proposes information theoretical quantities that incorporate the causal structure of the system, which can be used to evaluate causal importance of features for some given outcome variable. Specifically, we introduce causal versions of entropy and mutual information, termed causal entropy and causal information gain, which are designed to assess how much control a feature provides over the outcome variable. These newly defined quantities capture changes in the entropy of a variable resulting from interventions on other variables. Fundamental results connecting these quantities to the existence of causal effects are derived. The use of causal information gain in feature selection is demonstrated, highlighting its superiority over standard mutual information in revealing which features provide control over a chosen outcome variable. Our investigation paves the way for the development of methods with improved interpretability in domains involving causation.
    Improving physics-informed DeepONets with hard constraints. (arXiv:2309.07899v1 [cs.LG])
    Current physics-informed (standard or operator) neural networks still rely on accurately learning the initial conditions of the system they are solving. In contrast, standard numerical methods evolve such initial conditions without needing to learn these. In this study, we propose to improve current physics-informed deep learning strategies such that initial conditions do not need to be learned and are represented exactly in the predicted solution. Moreover, this method guarantees that when a DeepONet is applied multiple times to time step a solution, the resulting function is continuous.
    Identifying the Group-Theoretic Structure of Machine-Learned Symmetries. (arXiv:2309.07860v1 [hep-ph])
    Deep learning was recently successfully used in deriving symmetry transformations that preserve important physics quantities. Being completely agnostic, these techniques postpone the identification of the discovered symmetries to a later stage. In this letter we propose methods for examining and identifying the group-theoretic structure of such machine-learned symmetries. We design loss functions which probe the subalgebra structure either during the deep learning stage of symmetry discovery or in a subsequent post-processing stage. We illustrate the new methods with examples from the U(n) Lie group family, obtaining the respective subalgebra decompositions. As an application to particle physics, we demonstrate the identification of the residual symmetries after the spontaneous breaking of non-Abelian gauge symmetries like SU(3) and SU(5) which are commonly used in model building.
    A DenseNet-based method for decoding auditory spatial attention with EEG. (arXiv:2309.07690v1 [eess.SP])
    Auditory spatial attention detection (ASAD) aims to decode the attended spatial location with EEG in a multiple-speaker setting. ASAD methods are inspired by the brain lateralization of cortical neural responses during the processing of auditory spatial attention, and show promising performance for the task of auditory attention decoding (AAD) with neural recordings. In the previous ASAD methods, the spatial distribution of EEG electrodes is not fully exploited, which may limit the performance of these methods. In the present work, by transforming the original EEG channels into a two-dimensional (2D) spatial topological map, the EEG data is transformed into a three-dimensional (3D) arrangement containing spatial-temporal information. And then a 3D deep convolutional neural network (DenseNet-3D) is used to extract temporal and spatial features of the neural representation for the attended locations. The results show that the proposed method achieves higher decoding accuracy than the state-of-the-art (SOTA) method (94.4% compared to XANet's 90.6%) with 1-second decision window for the widely used KULeuven (KUL) dataset, and the code to implement our work is available on Github: https://github.com/xuxiran/ASAD_DenseNet
    Pareto Adversarial Robustness: Balancing Spatial Robustness and Sensitivity-based Robustness. (arXiv:2111.01996v2 [cs.LG] UPDATED)
    Adversarial robustness, which primarily comprises sensitivity-based robustness and spatial robustness, plays an integral part in achieving robust generalization. In this paper, we endeavor to design strategies to achieve universal adversarial robustness. To achieve this, we first investigate the relatively less-explored realm of spatial robustness. Then, we integrate the existing spatial robustness methods by incorporating both local and global spatial vulnerability into a unified spatial attack and adversarial training approach. Furthermore, we present a comprehensive relationship between natural accuracy, sensitivity-based robustness, and spatial robustness, supported by strong evidence from the perspective of robust representation. Crucially, to reconcile the interplay between the mutual impacts of various robustness components into one unified framework, we incorporate the \textit{Pareto criterion} into the adversarial robustness analysis, yielding a novel strategy called Pareto Adversarial Training for achieving universal robustness. The resulting Pareto front, which delineates the set of optimal solutions, provides an optimal balance between natural accuracy and various adversarial robustness. This sheds light on solutions for achieving universal robustness in the future. To the best of our knowledge, we are the first to consider universal adversarial robustness via multi-objective optimization.
    Goal Space Abstraction in Hierarchical Reinforcement Learning via Set-Based Reachability Analysis. (arXiv:2309.07675v1 [cs.LG])
    Open-ended learning benefits immensely from the use of symbolic methods for goal representation as they offer ways to structure knowledge for efficient and transferable learning. However, the existing Hierarchical Reinforcement Learning (HRL) approaches relying on symbolic reasoning are often limited as they require a manual goal representation. The challenge in autonomously discovering a symbolic goal representation is that it must preserve critical information, such as the environment dynamics. In this paper, we propose a developmental mechanism for goal discovery via an emergent representation that abstracts (i.e., groups together) sets of environment states that have similar roles in the task. We introduce a Feudal HRL algorithm that concurrently learns both the goal representation and a hierarchical policy. The algorithm uses symbolic reachability analysis for neural networks to approximate the transition relation among sets of states and to refine the goal representation. We evaluate our approach on complex navigation tasks, showing the learned representation is interpretable, transferrable and results in data efficient learning.
    Semi-supervised Domain Adaptation on Graphs with Contrastive Learning and Minimax Entropy. (arXiv:2309.07402v1 [cs.LG])
    Label scarcity in a graph is frequently encountered in real-world applications due to the high cost of data labeling. To this end, semi-supervised domain adaptation (SSDA) on graphs aims to leverage the knowledge of a labeled source graph to aid in node classification on a target graph with limited labels. SSDA tasks need to overcome the domain gap between the source and target graphs. However, to date, this challenging research problem has yet to be formally considered by the existing approaches designed for cross-graph node classification. To tackle the SSDA problem on graphs, a novel method called SemiGCL is proposed, which benefits from graph contrastive learning and minimax entropy training. SemiGCL generates informative node representations by contrasting the representations learned from a graph's local and global views. Additionally, SemiGCL is adversarially optimized with the entropy loss of unlabeled target nodes to reduce domain divergence. Experimental results on benchmark datasets demonstrate that SemiGCL outperforms the state-of-the-art baselines on the SSDA tasks.
    Beta Diffusion. (arXiv:2309.07867v1 [cs.LG])
    We introduce beta diffusion, a novel generative modeling method that integrates demasking and denoising to generate data within bounded ranges. Using scaled and shifted beta distributions, beta diffusion utilizes multiplicative transitions over time to create both forward and reverse diffusion processes, maintaining beta distributions in both the forward marginals and the reverse conditionals, given the data at any point in time. Unlike traditional diffusion-based generative models relying on additive Gaussian noise and reweighted evidence lower bounds (ELBOs), beta diffusion is multiplicative and optimized with KL-divergence upper bounds (KLUBs) derived from the convexity of the KL divergence. We demonstrate that the proposed KLUBs are more effective for optimizing beta diffusion compared to negative ELBOs, which can also be derived as the KLUBs of the same KL divergence with its two arguments swapped. The loss function of beta diffusion, expressed in terms of Bregman divergence, further supports the efficacy of KLUBs for optimization. Experimental results on both synthetic data and natural images demonstrate the unique capabilities of beta diffusion in generative modeling of range-bounded data and validate the effectiveness of KLUBs in optimizing diffusion models, thereby making them valuable additions to the family of diffusion-based generative models and the optimization techniques used to train them.
    Conformal Regression in Calorie Prediction for Team Jumbo-Visma. (arXiv:2304.03778v3 [cs.LG] UPDATED)
    UCI WorldTour races, the premier men's elite road cycling tour, are grueling events that put physical fitness and endurance of riders to the test. The coaches of Team Jumbo-Visma have long been responsible for predicting the energy needs of each rider of the Dutch team for every race on the calendar. Those must be estimated to ensure riders have the energy and resources necessary to maintain a high level of performance throughout a race. This task, however, is both time-consuming and challenging, as it requires precise estimates of race speed and power output. Traditionally, the approach to predicting energy needs has relied on judgement and experience of coaches, but this method has its limitations and often leads to inaccurate predictions. In this paper, we propose a new, more effective approach to predicting energy needs for cycling races. By predicting the speed and power with regression models, we provide the coaches with calorie needs estimates for each individual rider per stage instantly. In addition, we compare methods to quantify uncertainty using conformal prediction. The empirical analysis of the jackknife+, jackknife-minmax, jackknife-minmax-after-bootstrap, CV+, CV-minmax, conformalized quantile regression, and inductive conformal prediction methods in conformal prediction reveals that all methods achieve valid prediction intervals. All but minmax-based methods also produce sufficiently narrow prediction intervals for decision-making. Furthermore, methods computing prediction intervals of fixed size produce tighter intervals for low significance values. Among the methods computing intervals of varying length across the input space, inductive conformal prediction computes narrower prediction intervals at larger significance level.
    Multi-Source Domain Adaptation meets Dataset Distillation through Dataset Dictionary Learning. (arXiv:2309.07666v1 [cs.LG])
    In this paper, we consider the intersection of two problems in machine learning: Multi-Source Domain Adaptation (MSDA) and Dataset Distillation (DD). On the one hand, the first considers adapting multiple heterogeneous labeled source domains to an unlabeled target domain. On the other hand, the second attacks the problem of synthesizing a small summary containing all the information about the datasets. We thus consider a new problem called MSDA-DD. To solve it, we adapt previous works in the MSDA literature, such as Wasserstein Barycenter Transport and Dataset Dictionary Learning, as well as DD method Distribution Matching. We thoroughly experiment with this novel problem on four benchmarks (Caltech-Office 10, Tennessee-Eastman Process, Continuous Stirred Tank Reactor, and Case Western Reserve University), where we show that, even with as little as 1 sample per class, one achieves state-of-the-art adaptation performance.
    Optimal transport distances for directed, weighted graphs: a case study with cell-cell communication networks. (arXiv:2309.07030v2 [cs.LG] UPDATED)
    Comparing graphs by means of optimal transport has recently gained significant attention, as the distances induced by optimal transport provide both a principled metric between graphs as well as an interpretable description of the associated changes between graphs in terms of a transport plan. As the lack of symmetry introduces challenges in the typically considered formulations, optimal transport distances for graphs have mostly been developed for undirected graphs. Here, we propose two distance measures to compare directed graphs based on variants of optimal transport: (i) an earth movers distance (Wasserstein) and (ii) a Gromov-Wasserstein (GW) distance. We evaluate these two distances and discuss their relative performance for both simulated graph data and real-world directed cell-cell communication graphs, inferred from single-cell RNA-seq data.
    Preserved Edge Convolutional Neural Network for Sensitivity Enhancement of Deuterium Metabolic Imaging (DMI). (arXiv:2309.04100v2 [eess.IV] UPDATED)
    Purpose: Common to most MRSI techniques, the spatial resolution and the minimal scan duration of Deuterium Metabolic Imaging (DMI) are limited by the achievable SNR. This work presents a deep learning method for sensitivity enhancement of DMI. Methods: A convolutional neural network (CNN) was designed to estimate the 2H-labeled metabolite concentrations from low SNR and distorted DMI FIDs. The CNN was trained with synthetic data that represent a range of SNR levels typically encountered in vivo. The estimation precision was further improved by fine-tuning the CNN with MRI-based edge-preserving regularization for each DMI dataset. The proposed processing method, PReserved Edge ConvolutIonal neural network for Sensitivity Enhanced DMI (PRECISE-DMI), was applied to simulation studies and in vivo experiments to evaluate the anticipated improvements in SNR and investigate the potential for inaccuracies. Results: PRECISE-DMI visually improved the metabolic maps of low SNR datasets, and quantitatively provided higher precision than the standard Fourier reconstruction. Processing of DMI data acquired in rat brain tumor models resulted in more precise determination of 2H-labeled lactate and glutamate + glutamine levels, at increased spatial resolution (from >8 to 2 $\mu$L) or shortened scan time (from 32 to 4 min) compared to standard acquisitions. However, rigorous SD-bias analyses showed that overuse of the edge-preserving regularization can compromise the accuracy of the results. Conclusion: PRECISE-DMI allows a flexible trade-off between enhancing the sensitivity of DMI and minimizing the inaccuracies. With typical settings, the DMI sensitivity can be improved by 3-fold while retaining the capability to detect local signal variations.
    Massively-Parallel Heat Map Sorting and Applications To Explainable Clustering. (arXiv:2309.07486v1 [cs.DS])
    Given a set of points labeled with $k$ labels, we introduce the heat map sorting problem as reordering and merging the points and dimensions while preserving the clusters (labels). A cluster is preserved if it remains connected, i.e., if it is not split into several clusters and no two clusters are merged. We prove the problem is NP-hard and we give a fixed-parameter algorithm with a constant number of rounds in the massively parallel computation model, where each machine has a sublinear memory and the total memory of the machines is linear. We give an approximation algorithm for a NP-hard special case of the problem. We empirically compare our algorithm with k-means and density-based clustering (DBSCAN) using a dimensionality reduction via locality-sensitive hashing on several directed and undirected graphs of email and computer networks.
    Efficient Spatially Sparse Inference for Conditional GANs and Diffusion Models. (arXiv:2211.02048v4 [cs.CV] UPDATED)
    During image editing, existing deep generative models tend to re-synthesize the entire output from scratch, including the unedited regions. This leads to a significant waste of computation, especially for minor editing operations. In this work, we present Spatially Sparse Inference (SSI), a general-purpose technique that selectively performs computation for edited regions and accelerates various generative models, including both conditional GANs and diffusion models. Our key observation is that users prone to gradually edit the input image. This motivates us to cache and reuse the feature maps of the original image. Given an edited image, we sparsely apply the convolutional filters to the edited regions while reusing the cached features for the unedited areas. Based on our algorithm, we further propose Sparse Incremental Generative Engine (SIGE) to convert the computation reduction to latency reduction on off-the-shelf hardware. With about $1\%$-area edits, SIGE accelerates DDPM by $3.0\times$ on NVIDIA RTX 3090 and $4.6\times$ on Apple M1 Pro GPU, Stable Diffusion by $7.2\times$ on 3090, and GauGAN by $5.6\times$ on 3090 and $5.2\times$ on M1 Pro GPU. Compared to our conference version, we extend SIGE to accommodate attention layers and apply it to Stable Diffusion. Additionally, we offer support for Apple M1 Pro GPU and include more results with large and sequential edits.
    EnCodecMAE: Leveraging neural codecs for universal audio representation learning. (arXiv:2309.07391v1 [cs.SD])
    The goal of universal audio representation learning is to obtain foundational models that can be used for a variety of downstream tasks involving speech, music or environmental sounds. To approach this problem, methods inspired by self-supervised models from NLP, like BERT, are often used and adapted to audio. These models rely on the discrete nature of text, hence adopting this type of approach for audio processing requires either a change in the learning objective or mapping the audio signal to a set of discrete classes. In this work, we explore the use of EnCodec, a neural audio codec, to generate discrete targets for learning an universal audio model based on a masked autoencoder (MAE). We evaluate this approach, which we call EncodecMAE, on a wide range of audio tasks spanning speech, music and environmental sounds, achieving performances comparable or better than leading audio representation models.
    Benign Overfitting without Linearity: Neural Network Classifiers Trained by Gradient Descent for Noisy Linear Data. (arXiv:2202.05928v4 [cs.LG] UPDATED)
    Benign overfitting, the phenomenon where interpolating models generalize well in the presence of noisy data, was first observed in neural network models trained with gradient descent. To better understand this empirical observation, we consider the generalization error of two-layer neural networks trained to interpolation by gradient descent on the logistic loss following random initialization. We assume the data comes from well-separated class-conditional log-concave distributions and allow for a constant fraction of the training labels to be corrupted by an adversary. We show that in this setting, neural networks exhibit benign overfitting: they can be driven to zero training error, perfectly fitting any noisy training labels, and simultaneously achieve minimax optimal test error. In contrast to previous work on benign overfitting that require linear or kernel-based predictors, our analysis holds in a setting where both the model and learning dynamics are fundamentally nonlinear.
    Simultaneous inference for generalized linear models with unmeasured confounders. (arXiv:2309.07261v1 [stat.ME])
    Tens of thousands of simultaneous hypothesis tests are routinely performed in genomic studies to identify differentially expressed genes. However, due to unmeasured confounders, many standard statistical approaches may be substantially biased. This paper investigates the large-scale hypothesis testing problem for multivariate generalized linear models in the presence of confounding effects. Under arbitrary confounding mechanisms, we propose a unified statistical estimation and inference framework that harnesses orthogonal structures and integrates linear projections into three key stages. It first leverages multivariate responses to separate marginal and uncorrelated confounding effects, recovering the confounding coefficients' column space. Subsequently, latent factors and primary effects are jointly estimated, utilizing $\ell_1$-regularization for sparsity while imposing orthogonality onto confounding coefficients. Finally, we incorporate projected and weighted bias-correction steps for hypothesis testing. Theoretically, we establish various effects' identification conditions and non-asymptotic error bounds. We show effective Type-I error control of asymptotic $z$-tests as sample and response sizes approach infinity. Numerical experiments demonstrate that the proposed method controls the false discovery rate by the Benjamini-Hochberg procedure and is more powerful than alternative methods. By comparing single-cell RNA-seq counts from two groups of samples, we demonstrate the suitability of adjusting confounding effects when significant covariates are absent from the model.
    DGSD: Dynamical Graph Self-Distillation for EEG-Based Auditory Spatial Attention Detection. (arXiv:2309.07147v1 [eess.SP])
    Auditory Attention Detection (AAD) aims to detect target speaker from brain signals in a multi-speaker environment. Although EEG-based AAD methods have shown promising results in recent years, current approaches primarily rely on traditional convolutional neural network designed for processing Euclidean data like images. This makes it challenging to handle EEG signals, which possess non-Euclidean characteristics. In order to address this problem, this paper proposes a dynamical graph self-distillation (DGSD) approach for AAD, which does not require speech stimuli as input. Specifically, to effectively represent the non-Euclidean properties of EEG signals, dynamical graph convolutional networks are applied to represent the graph structure of EEG signals, which can also extract crucial features related to auditory spatial attention in EEG signals. In addition, to further improve AAD detection performance, self-distillation, consisting of feature distillation and hierarchical distillation strategies at each layer, is integrated. These strategies leverage features and classification results from the deepest network layers to guide the learning of shallow layers. Our experiments are conducted on two publicly available datasets, KUL and DTU. Under a 1-second time window, we achieve results of 90.0\% and 79.6\% accuracy on KUL and DTU, respectively. We compare our DGSD method with competitive baselines, and the experimental results indicate that the detection performance of our proposed DGSD method is not only superior to the best reproducible baseline but also significantly reduces the number of trainable parameters by approximately 100 times.
    Systematic Review of Experimental Paradigms and Deep Neural Networks for Electroencephalography-Based Cognitive Workload Detection. (arXiv:2309.07163v1 [eess.SP])
    This article summarizes a systematic review of the electroencephalography (EEG)-based cognitive workload (CWL) estimation. The focus of the article is twofold: identify the disparate experimental paradigms used for reliably eliciting discreet and quantifiable levels of cognitive load and the specific nature and representational structure of the commonly used input formulations in deep neural networks (DNNs) used for signal classification. The analysis revealed a number of studies using EEG signals in its native representation of a two-dimensional matrix for offline classification of CWL. However, only a few studies adopted an online or pseudo-online classification strategy for real-time CWL estimation. Further, only a couple of interpretable DNNs and a single generative model were employed for cognitive load detection till date during this review. More often than not, researchers were using DNNs as black-box type models. In conclusion, DNNs prove to be valuable tools for classifying EEG signals, primarily due to the substantial modeling power provided by the depth of their network architecture. It is further suggested that interpretable and explainable DNN models must be employed for cognitive workload estimation since existing methods are limited in the face of the non-stationary nature of the signal.
    The Grand Illusion: The Myth of Software Portability and Implications for ML Progress. (arXiv:2309.07181v1 [cs.SE])
    Pushing the boundaries of machine learning often requires exploring different hardware and software combinations. However, the freedom to experiment across different tooling stacks can be at odds with the drive for efficiency, which has produced increasingly specialized AI hardware and incentivized consolidation around a narrow set of ML frameworks. Exploratory research can be restricted if software and hardware are co-evolving, making it even harder to stray away from mainstream ideas that work well with popular tooling stacks. While this friction increasingly impacts the rate of innovation in machine learning, to our knowledge the lack of portability in tooling has not been quantified. In this work, we ask: How portable are popular ML software frameworks? We conduct a large-scale study of the portability of mainstream ML frameworks across different hardware types. Our findings paint an uncomfortable picture -- frameworks can lose more than 40% of their key functions when ported to other hardware. Worse, even when functions are portable, the slowdown in their performance can be extreme and render performance untenable. Collectively, our results reveal how costly straying from a narrow set of hardware-software combinations can be - and suggest that specialization of hardware impedes innovation in machine learning research.
    The effect of data augmentation and 3D-CNN depth on Alzheimer's Disease detection. (arXiv:2309.07192v1 [eess.IV])
    Machine Learning (ML) has emerged as a promising approach in healthcare, outperforming traditional statistical techniques. However, to establish ML as a reliable tool in clinical practice, adherence to best practices regarding data handling, experimental design, and model evaluation is crucial. This work summarizes and strictly observes such practices to ensure reproducible and reliable ML. Specifically, we focus on Alzheimer's Disease (AD) detection, which serves as a paradigmatic example of challenging problem in healthcare. We investigate the impact of different data augmentation techniques and model complexity on the overall performance. We consider MRI data from ADNI dataset to address a classification problem employing 3D Convolutional Neural Network (CNN). The experiments are designed to compensate for data scarcity and initial random parameters by utilizing cross-validation and multiple training trials. Within this framework, we train 15 predictive models, considering three different data augmentation strategies and five distinct 3D CNN architectures, each varying in the number of convolutional layers. Specifically, the augmentation strategies are based on affine transformations, such as zoom, shift, and rotation, applied concurrently or separately. The combined effect of data augmentation and model complexity leads to a variation in prediction performance up to 10% of accuracy. When affine transformation are applied separately, the model is more accurate, independently from the adopted architecture. For all strategies, the model accuracy followed a concave behavior at increasing number of convolutional layers, peaking at an intermediate value of layers. The best model (8 CL, (B)) is the most stable across cross-validation folds and training trials, reaching excellent performance both on the testing set and on an external test set.
    Improved Auto-Encoding using Deterministic Projected Belief Networks. (arXiv:2309.07481v1 [cs.LG])
    In this paper, we exploit the unique properties of a deterministic projected belief network (D-PBN) to take full advantage of trainable compound activation functions (TCAs). A D-PBN is a type of auto-encoder that operates by "backing up" through a feed-forward neural network. TCAs are activation functions with complex monotonic-increasing shapes that change the distribution of the data so that the linear transformation that follows is more effective. Because a D-PBN operates by "backing up", the TCAs are inverted in the reconstruction process, restoring the original distribution of the data, thus taking advantage of a given TCA in both analysis and reconstruction. In this paper, we show that a D-PBN auto-encoder with TCAs can significantly out-perform standard auto-encoders including variational auto-encoders.
    Statistically Valid Variable Importance Assessment through Conditional Permutations. (arXiv:2309.07593v1 [cs.LG])
    Variable importance assessment has become a crucial step in machine-learning applications when using complex learners, such as deep neural networks, on large-scale data. Removal-based importance assessment is currently the reference approach, particularly when statistical guarantees are sought to justify variable inclusion. It is often implemented with variable permutation schemes. On the flip side, these approaches risk misidentifying unimportant variables as important in the presence of correlations among covariates. Here we develop a systematic approach for studying Conditional Permutation Importance (CPI) that is model agnostic and computationally lean, as well as reusable benchmarks of state-of-the-art variable importance estimators. We show theoretically and empirically that $\textit{CPI}$ overcomes the limitations of standard permutation importance by providing accurate type-I error control. When used with a deep neural network, $\textit{CPI}$ consistently showed top accuracy across benchmarks. An empirical benchmark on real-world data analysis in a large-scale medical dataset showed that $\textit{CPI}$ provides a more parsimonious selection of statistically significant variables. Our results suggest that $\textit{CPI}$ can be readily used as drop-in replacement for permutation-based methods.
    Virchow: A Million-Slide Digital Pathology Foundation Model. (arXiv:2309.07778v1 [eess.IV])
    Computational pathology uses artificial intelligence to enable precision medicine and decision support systems through the analysis of whole slide images. It has the potential to revolutionize the diagnosis and treatment of cancer. However, a major challenge to this objective is that for many specific computational pathology tasks the amount of data is inadequate for development. To address this challenge, we created Virchow, a 632 million parameter deep neural network foundation model for computational pathology. Using self-supervised learning, Virchow is trained on 1.5 million hematoxylin and eosin stained whole slide images from diverse tissue groups, which is orders of magnitude more data than previous works. When evaluated on downstream tasks including tile-level pan-cancer detection and subtyping and slide-level biomarker prediction, Virchow outperforms state-of-the-art systems both on internal datasets drawn from the same population as the pretraining data as well as external public datasets. Virchow achieves 93% balanced accuracy for pancancer tile classification, and AUCs of 0.983 for colon microsatellite instability status prediction and 0.967 for breast CDH1 status prediction. The gains in performance highlight the importance of pretraining on massive pathology image datasets, suggesting pretraining on even larger datasets could continue improving performance for many high-impact applications where limited amounts of training data are available, such as drug outcome prediction.
    PRE: Vision-Language Prompt Learning with Reparameterization Encoder. (arXiv:2309.07760v1 [cs.CV])
    Large pre-trained vision-language models such as CLIP have demonstrated great potential in zero-shot transferability to downstream tasks. However, to attain optimal performance, the manual selection of prompts is necessary to improve alignment between the downstream image distribution and the textual class descriptions. This manual prompt engineering is the major challenge for deploying such models in practice since it requires domain expertise and is extremely time-consuming. To avoid non-trivial prompt engineering, recent work Context Optimization (CoOp) introduced the concept of prompt learning to the vision domain using learnable textual tokens. While CoOp can achieve substantial improvements over manual prompts, its learned context is worse generalizable to wider unseen classes within the same dataset. In this work, we present Prompt Learning with Reparameterization Encoder (PRE) - a simple and efficient method that enhances the generalization ability of the learnable prompt to unseen classes while maintaining the capacity to learn Base classes. Instead of directly optimizing the prompts, PRE employs a prompt encoder to reparameterize the input prompt embeddings, enhancing the exploration of task-specific knowledge from few-shot samples. Experiments and extensive ablation studies on 8 benchmarks demonstrate that our approach is an efficient method for prompt learning. Specifically, PRE achieves a notable enhancement of 5.60% in average accuracy on New classes and 3% in Harmonic mean compared to CoOp in the 16-shot setting, all achieved within a good training time.
    Advancing Regular Language Reasoning in Linear Recurrent Neural Networks. (arXiv:2309.07412v1 [cs.CL])
    In recent studies, linear recurrent neural networks (LRNNs) have achieved Transformer-level performance in natural language modeling and long-range modeling while offering rapid parallel training and constant inference costs. With the resurged interest in LRNNs, we study whether they can learn the hidden rules in training sequences, such as the grammatical structures of regular language. We theoretically analyze some existing LRNNs and discover their limitations on regular language. Motivated by the analysis, we propose a new LRNN equipped with a block-diagonal and input-dependent transition matrix. Experiments suggest that the proposed model is the only LRNN that can perform length extrapolation on regular language tasks such as Sum, Even Pair, and Modular Arithmetic.
    Learning Beyond Similarities: Incorporating Dissimilarities between Positive Pairs in Self-Supervised Time Series Learning. (arXiv:2309.07526v1 [cs.LG])
    By identifying similarities between successive inputs, Self-Supervised Learning (SSL) methods for time series analysis have demonstrated their effectiveness in encoding the inherent static characteristics of temporal data. However, an exclusive emphasis on similarities might result in representations that overlook the dynamic attributes critical for modeling cardiovascular diseases within a confined subject cohort. Introducing Distilled Encoding Beyond Similarities (DEBS), this paper pioneers an SSL approach that transcends mere similarities by integrating dissimilarities among positive pairs. The framework is applied to electrocardiogram (ECG) signals, leading to a notable enhancement of +10\% in the detection accuracy of Atrial Fibrillation (AFib) across diverse subjects. DEBS underscores the potential of attaining a more refined representation by encoding the dynamic characteristics of time series data, tapping into dissimilarities during the optimization process. Broadly, the strategy delineated in this study holds the promise of unearthing novel avenues for advancing SSL methodologies tailored to temporal data.
    Unbiased Face Synthesis With Diffusion Models: Are We There Yet?. (arXiv:2309.07277v1 [cs.CV])
    Text-to-image diffusion models have achieved widespread popularity due to their unprecedented image generation capability. In particular, their ability to synthesize and modify human faces has spurred research into using generated face images in both training data augmentation and model performance assessments. In this paper, we study the efficacy and shortcomings of generative models in the context of face generation. Utilizing a combination of qualitative and quantitative measures, including embedding-based metrics and user studies, we present a framework to audit the characteristics of generated faces conditioned on a set of social attributes. We applied our framework on faces generated through state-of-the-art text-to-image diffusion models. We identify several limitations of face image generation that include faithfulness to the text prompt, demographic disparities, and distributional shifts. Furthermore, we present an analytical model that provides insights into how training data selection contributes to the performance of generative models.
    SC-MAD: Mixtures of Higher-order Networks for Data Augmentation. (arXiv:2309.07453v1 [stat.ML])
    The myriad complex systems with multiway interactions motivate the extension of graph-based pairwise connections to higher-order relations. In particular, the simplicial complex has inspired generalizations of graph neural networks (GNNs) to simplicial complex-based models. Learning on such systems requires large amounts of data, which can be expensive or impossible to obtain. We propose data augmentation of simplicial complexes through both linear and nonlinear mixup mechanisms that return mixtures of existing labeled samples. In addition to traditional pairwise mixup, we present a convex clustering mixup approach for a data-driven relationship among several simplicial complexes. We theoretically demonstrate that the resultant synthetic simplicial complexes interpolate among existing data with respect to homomorphism densities. Our method is demonstrated on both synthetic and real-world datasets for simplicial complex classification.
    Mitigating Adversarial Attacks in Federated Learning with Trusted Execution Environments. (arXiv:2309.07197v1 [cs.LG])
    The main premise of federated learning (FL) is that machine learning model updates are computed locally to preserve user data privacy. This approach avoids by design user data to ever leave the perimeter of their device. Once the updates aggregated, the model is broadcast to all nodes in the federation. However, without proper defenses, compromised nodes can probe the model inside their local memory in search for adversarial examples, which can lead to dangerous real-world scenarios. For instance, in image-based applications, adversarial examples consist of images slightly perturbed to the human eye getting misclassified by the local model. These adversarial images are then later presented to a victim node's counterpart model to replay the attack. Typical examples harness dissemination strategies such as altered traffic signs (patch attacks) no longer recognized by autonomous vehicles or seemingly unaltered samples that poison the local dataset of the FL scheme to undermine its robustness. Pelta is a novel shielding mechanism leveraging Trusted Execution Environments (TEEs) that reduce the ability of attackers to craft adversarial samples. Pelta masks inside the TEE the first part of the back-propagation chain rule, typically exploited by attackers to craft the malicious samples. We evaluate Pelta on state-of-the-art accurate models using three well-established datasets: CIFAR-10, CIFAR-100 and ImageNet. We show the effectiveness of Pelta in mitigating six white-box state-of-the-art adversarial attacks, such as Projected Gradient Descent, Momentum Iterative Method, Auto Projected Gradient Descent, the Carlini & Wagner attack. In particular, Pelta constitutes the first attempt at defending an ensemble model against the Self-Attention Gradient attack to the best of our knowledge. Our code is available to the research community at https://github.com/queyrusi/Pelta.
    Exploring Large Language Models for Ontology Alignment. (arXiv:2309.07172v1 [cs.AI])
    This work investigates the applicability of recent generative Large Language Models (LLMs), such as the GPT series and Flan-T5, to ontology alignment for identifying concept equivalence mappings across ontologies. To test the zero-shot performance of Flan-T5-XXL and GPT-3.5-turbo, we leverage challenging subsets from two equivalence matching datasets of the OAEI Bio-ML track, taking into account concept labels and structural contexts. Preliminary findings suggest that LLMs have the potential to outperform existing ontology alignment systems like BERTMap, given careful framework and prompt design.
    Frequency Convergence of Complexon Shift Operators. (arXiv:2309.07169v1 [eess.SP])
    Topological signal processing (TSP) utilizes simplicial complexes to model structures with higher order than vertices and edges. In this paper, we study the transferability of TSP via a generalized higher-order version of graphon, known as complexon. We recall the notion of a complexon as the limit of a simplicial complex sequence [1]. Inspired by the integral operator form of graphon shift operators, we construct a marginal complexon and complexon shift operator (CSO) according to components of all possible dimensions from the complexon. We investigate the CSO's eigenvalues and eigenvectors, and relate them to a new family of weighted adjacency matrices. We prove that when a simplicial complex sequence converges to a complexon, the eigenvalues of the corresponding CSOs converge to that of the limit complexon. These results hint at learning transferability on large simplicial complexes or simplicial complex sequences, which generalize the graphon signal processing framework.
    Audio-Based Classification of Respiratory Diseases using Advanced Signal Processing and Machine Learning for Assistive Diagnosis Support. (arXiv:2309.07183v1 [eess.SP])
    In global healthcare, respiratory diseases are a leading cause of mortality, underscoring the need for rapid and accurate diagnostics. To advance rapid screening techniques via auscultation, our research focuses on employing one of the largest publicly available medical database of respiratory sounds to train multiple machine learning models able to classify different health conditions. Our method combines Empirical Mode Decomposition (EMD) and spectral analysis to extract physiologically relevant biosignals from acoustic data, closely tied to cardiovascular and respiratory patterns, making our approach apart in its departure from conventional audio feature extraction practices. We use Power Spectral Density analysis and filtering techniques to select Intrinsic Mode Functions (IMFs) strongly correlated with underlying physiological phenomena. These biosignals undergo a comprehensive feature extraction process for predictive modeling. Initially, we deploy a binary classification model that demonstrates a balanced accuracy of 87% in distinguishing between healthy and diseased individuals. Subsequently, we employ a six-class classification model that achieves a balanced accuracy of 72% in diagnosing specific respiratory conditions like pneumonia and chronic obstructive pulmonary disease (COPD). For the first time, we also introduce regression models that estimate age and body mass index (BMI) based solely on acoustic data, as well as a model for gender classification. Our findings underscore the potential of this approach to significantly enhance assistive and remote diagnostic capabilities.
    Boosting Unsupervised Contrastive Learning Using Diffusion-Based Data Augmentation From Scratch. (arXiv:2309.07909v1 [cs.LG])
    Unsupervised contrastive learning methods have recently seen significant improvements, particularly through data augmentation strategies that aim to produce robust and generalizable representations. However, prevailing data augmentation methods, whether hand designed or based on foundation models, tend to rely heavily on prior knowledge or external data. This dependence often compromises their effectiveness and efficiency. Furthermore, the applicability of most existing data augmentation strategies is limited when transitioning to other research domains, especially science-related data. This limitation stems from the paucity of prior knowledge and labeled data available in these domains. To address these challenges, we introduce DiffAug-a novel and efficient Diffusion-based data Augmentation technique. DiffAug aims to ensure that the augmented and original data share a smoothed latent space, which is achieved through diffusion steps. Uniquely, unlike traditional methods, DiffAug first mines sufficient prior semantic knowledge about the neighborhood. This provides a constraint to guide the diffusion steps, eliminating the need for labels, external data/models, or prior knowledge. Designed as an architecture-agnostic framework, DiffAug provides consistent improvements. Specifically, it improves image classification and clustering accuracy by 1.6%~4.5%. When applied to biological data, DiffAug improves performance by up to 10.1%, with an average improvement of 5.8%. DiffAug shows good performance in both vision and biological domains.
    Distribution Grid Line Outage Identification with Unknown Pattern and Performance Guarantee. (arXiv:2309.07157v1 [cs.LG])
    Line outage identification in distribution grids is essential for sustainable grid operation. In this work, we propose a practical yet robust detection approach that utilizes only readily available voltage magnitudes, eliminating the need for costly phase angles or power flow data. Given the sensor data, many existing detection methods based on change-point detection require prior knowledge of outage patterns, which are unknown for real-world outage scenarios. To remove this impractical requirement, we propose a data-driven method to learn the parameters of the post-outage distribution through gradient descent. However, directly using gradient descent presents feasibility issues. To address this, we modify our approach by adding a Bregman divergence constraint to control the trajectory of the parameter updates, which eliminates the feasibility problems. As timely operation is the key nowadays, we prove that the optimal parameters can be learned with convergence guarantees via leveraging the statistical and physical properties of voltage data. We evaluate our approach using many representative distribution grids and real load profiles with 17 outage configurations. The results show that we can detect and localize the outage in a timely manner with only voltage magnitudes and without assuming a prior knowledge of outage patterns.
    Effective Latent Differential Equation Models via Attention and Multiple Shooting. (arXiv:2307.05735v3 [cs.LG] UPDATED)
    Scientific Machine Learning (SciML) is a burgeoning field that synergistically combines domain-aware and interpretable models with agnostic machine learning techniques. In this work, we introduce GOKU-UI, an evolution of the SciML generative model GOKU-nets. GOKU-UI not only broadens the original model's spectrum to incorporate other classes of differential equations, such as Stochastic Differential Equations (SDEs), but also integrates attention mechanisms and a novel multiple shooting training strategy in the latent space. These modifications have led to a significant increase in its performance in both reconstruction and forecast tasks, as demonstrated by our evaluation of simulated and empirical data. Specifically, GOKU-UI outperformed all baseline models on synthetic datasets even with a training set 16-fold smaller, underscoring its remarkable data efficiency. Furthermore, when applied to empirical human brain data, while incorporating stochastic Stuart-Landau oscillators into its dynamical core, our proposed enhancements markedly increased the model's effectiveness in capturing complex brain dynamics. This augmented version not only surpassed all baseline methods in the reconstruction task, but also demonstrated lower prediction error of future brain activity up to 15 seconds ahead. By training GOKU-UI on resting state fMRI data, we encoded whole-brain dynamics into a latent representation, learning a low-dimensional dynamical system model that could offer insights into brain functionality and open avenues for practical applications such as the classification of mental states or psychiatric conditions. Ultimately, our research provides further impetus for the field of Scientific Machine Learning, showcasing the potential for advancements when established scientific insights are interwoven with modern machine learning.
    A Strong and Simple Deep Learning Baseline for BCI MI Decoding. (arXiv:2309.07159v1 [eess.SP])
    We propose EEG-SimpleConv, a straightforward 1D convolutional neural network for Motor Imagery decoding in BCI. Our main motivation is to propose a very simple baseline to compare to, using only very standard ingredients from the literature. We evaluate its performance on four EEG Motor Imagery datasets, including simulated online setups, and compare it to recent Deep Learning and Machine Learning approaches. EEG-SimpleConv is at least as good or far more efficient than other approaches, showing strong knowledge-transfer capabilities across subjects, at the cost of a low inference time. We advocate that using off-the-shelf ingredients rather than coming with ad-hoc solutions can significantly help the adoption of Deep Learning approaches for BCI. We make the code of the models and the experiments accessible.
    An Optimal Control Method to Compute the Most Likely Transition Path for Stochastic Dynamical Systems with Jumps. (arXiv:2203.16874v2 [math.NA] UPDATED)
    Many complex real world phenomena exhibit abrupt, intermittent or jumping behaviors, which are more suitable to be described by stochastic differential equations under non-Gaussian L\'evy noise. Among these complex phenomena, the most likely transition paths between metastable states are important since these rare events may have a high impact in certain scenarios. Based on the large deviation principle, the most likely transition path could be treated as the minimizer of the rate function upon paths that connect two points. One of the challenges to calculate the most likely transition path for stochastic dynamical systems under non-Gaussian L\'evy noise is that the associated rate function can not be explicitly expressed by paths. For this reason, we formulate an optimal control problem to obtain the optimal state as the most likely transition path. We then develop a neural network method to solve this issue. Several experiments are investigated for both Gaussian and non-Gaussian cases.
    Decoding visual brain representations from electroencephalography through Knowledge Distillation and latent diffusion models. (arXiv:2309.07149v1 [eess.SP])
    Decoding visual representations from human brain activity has emerged as a thriving research domain, particularly in the context of brain-computer interfaces. Our study presents an innovative method that employs to classify and reconstruct images from the ImageNet dataset using electroencephalography (EEG) data from subjects that had viewed the images themselves (i.e. "brain decoding"). We analyzed EEG recordings from 6 participants, each exposed to 50 images spanning 40 unique semantic categories. These EEG readings were converted into spectrograms, which were then used to train a convolutional neural network (CNN), integrated with a knowledge distillation procedure based on a pre-trained Contrastive Language-Image Pre-Training (CLIP)-based image classification teacher network. This strategy allowed our model to attain a top-5 accuracy of 80%, significantly outperforming a standard CNN and various RNN-based benchmarks. Additionally, we incorporated an image reconstruction mechanism based on pre-trained latent diffusion models, which allowed us to generate an estimate of the images which had elicited EEG activity. Therefore, our architecture not only decodes images from neural activity but also offers a credible image reconstruction from EEG only, paving the way for e.g. swift, individualized feedback experiments. Our research represents a significant step forward in connecting neural signals with visual cognition.
    Naturalistic Robot Arm Trajectory Generation via Representation Learning. (arXiv:2309.07550v1 [cs.RO])
    The integration of manipulator robots in household environments suggests a need for more predictable and human-like robot motion. This holds especially true for wheelchair-mounted assistive robots that can support the independence of people with paralysis. One method of generating naturalistic motion trajectories is via the imitation of human demonstrators. This paper explores a self-supervised imitation learning method using an autoregressive spatio-temporal graph neural network for an assistive drinking task. We address learning from diverse human motion trajectory data that were captured via wearable IMU sensors on a human arm as the action-free task demonstrations. Observed arm motion data from several participants is used to generate natural and functional drinking motion trajectories for a UR5e robot arm.
    Beta quantile regression for robust estimation of uncertainty in the presence of outliers. (arXiv:2309.07374v1 [cs.LG])
    Quantile Regression (QR) can be used to estimate aleatoric uncertainty in deep neural networks and can generate prediction intervals. Quantifying uncertainty is particularly important in critical applications such as clinical diagnosis, where a realistic assessment of uncertainty is essential in determining disease status and planning the appropriate treatment. The most common application of quantile regression models is in cases where the parametric likelihood cannot be specified. Although quantile regression is quite robust to outlier response observations, it can be sensitive to outlier covariate observations (features). Outlier features can compromise the performance of deep learning regression problems such as style translation, image reconstruction, and deep anomaly detection, potentially leading to misleading conclusions. To address this problem, we propose a robust solution for quantile regression that incorporates concepts from robust divergence. We compare the performance of our proposed method with (i) least trimmed quantile regression and (ii) robust regression based on the regularization of case-specific parameters in a simple real dataset in the presence of outlier. These methods have not been applied in a deep learning framework. We also demonstrate the applicability of the proposed method by applying it to a medical imaging translation task using diffusion models.
    Overview of Human Activity Recognition Using Sensor Data. (arXiv:2309.07170v1 [eess.SP])
    Human activity recognition (HAR) is an essential research field that has been used in different applications including home and workplace automation, security and surveillance as well as healthcare. Starting from conventional machine learning methods to the recently developing deep learning techniques and the Internet of things, significant contributions have been shown in the HAR area in the last decade. Even though several review and survey studies have been published, there is a lack of sensor-based HAR overview studies focusing on summarising the usage of wearable sensors and smart home sensors data as well as applications of HAR and deep learning techniques. Hence, we overview sensor-based HAR, discuss several important applications that rely on HAR, and highlight the most common machine learning methods that have been used for HAR. Finally, several challenges of HAR are explored that should be addressed to further improve the robustness of HAR.
    Tackling the dimensions in imaging genetics with CLUB-PLS. (arXiv:2309.07352v1 [q-bio.GN])
    A major challenge in imaging genetics and similar fields is to link high-dimensional data in one domain, e.g., genetic data, to high dimensional data in a second domain, e.g., brain imaging data. The standard approach in the area are mass univariate analyses across genetic factors and imaging phenotypes. That entails executing one genome-wide association study (GWAS) for each pre-defined imaging measure. Although this approach has been tremendously successful, one shortcoming is that phenotypes must be pre-defined. Consequently, effects that are not confined to pre-selected regions of interest or that reflect larger brain-wide patterns can easily be missed. In this work we introduce a Partial Least Squares (PLS)-based framework, which we term Cluster-Bootstrap PLS (CLUB-PLS), that can work with large input dimensions in both domains as well as with large sample sizes. One key factor of the framework is to use cluster bootstrap to provide robust statistics for single input features in both domains. We applied CLUB-PLS to investigating the genetic basis of surface area and cortical thickness in a sample of 33,000 subjects from the UK Biobank. We found 107 genome-wide significant locus-phenotype pairs that are linked to 386 different genes. We found that a vast majority of these loci could be technically validated at a high rate: using classic GWAS or Genome-Wide Inferred Statistics (GWIS) we found that 85 locus-phenotype pairs exceeded the genome-wide suggestive (P<1e-05) threshold.
    Sleep Stage Classification Using a Pre-trained Deep Learning Model. (arXiv:2309.07182v1 [eess.SP])
    One of the common human diseases is sleep disorders. The classification of sleep stages plays a fundamental role in diagnosing sleep disorders, monitoring treatment effectiveness, and understanding the relationship between sleep stages and various health conditions. A precise and efficient classification of these stages can significantly enhance our understanding of sleep-related phenomena and ultimately lead to improved health outcomes and disease treatment. Models others propose are often time-consuming and lack sufficient accuracy, especially in stage N1. The main objective of this research is to present a machine-learning model called "EEGMobile". This model utilizes pre-trained models and learns from electroencephalogram (EEG) spectrograms of brain signals. The model achieved an accuracy of 86.97% on a publicly available dataset named "Sleep-EDF20", outperforming other models proposed by different researchers. Moreover, it recorded an accuracy of 56.4% in stage N1, which is better than other models. These findings demonstrate that this model has the potential to achieve better results for the treatment of this disease.
    Benchmarking machine learning models for quantum state classification. (arXiv:2309.07679v1 [quant-ph])
    Quantum computing is a growing field where the information is processed by two-levels quantum states known as qubits. Current physical realizations of qubits require a careful calibration, composed by different experiments, due to noise and decoherence phenomena. Among the different characterization experiments, a crucial step is to develop a model to classify the measured state by discriminating the ground state from the excited state. In this proceedings we benchmark multiple classification techniques applied to real quantum devices.
    Design of Recognition and Evaluation System for Table Tennis Players' Motor Skills Based on Artificial Intelligence. (arXiv:2309.07141v1 [eess.SP])
    With the rapid development of electronic science and technology, the research on wearable devices is constantly updated, but for now, it is not comprehensive for wearable devices to recognize and analyze the movement of specific sports. Based on this, this paper improves wearable devices of table tennis sport, and realizes the pattern recognition and evaluation of table tennis players' motor skills through artificial intelligence. Firstly, a device is designed to collect the movement information of table tennis players and the actual movement data is processed. Secondly, a sliding window is made to divide the collected motion data into a characteristic database of six table tennis benchmark movements. Thirdly, motion features were constructed based on feature engineering, and motor skills were identified for different models after dimensionality reduction. Finally, the hierarchical evaluation system of motor skills is established with the loss functions of different evaluation indexes. The results show that in the recognition of table tennis players' motor skills, the feature-based BP neural network proposed in this paper has higher recognition accuracy and stronger generalization ability than the traditional convolutional neural network.
    Finding Influencers in Complex Networks: An Effective Deep Reinforcement Learning Approach. (arXiv:2309.07153v1 [cs.SI])
    Maximizing influences in complex networks is a practically important but computationally challenging task for social network analysis, due to its NP- hard nature. Most current approximation or heuristic methods either require tremendous human design efforts or achieve unsatisfying balances between effectiveness and efficiency. Recent machine learning attempts only focus on speed but lack performance enhancement. In this paper, different from previous attempts, we propose an effective deep reinforcement learning model that achieves superior performances over traditional best influence maximization algorithms. Specifically, we design an end-to-end learning framework that combines graph neural network as the encoder and reinforcement learning as the decoder, named DREIM. Trough extensive training on small synthetic graphs, DREIM outperforms the state-of-the-art baseline methods on very large synthetic and real-world networks on solution quality, and we also empirically show its linear scalability with regard to the network size, which demonstrates its superiority in solving this problem.
    A Deep Dive into Sleep: Single-Channel EEG-Based Sleep Stage Classification with Model Interpretability. (arXiv:2309.07156v1 [eess.SP])
    Sleep, a fundamental physiological process, occupies a significant portion of our lives. Accurate classification of sleep stages serves as a crucial tool for evaluating sleep quality and identifying probable sleep disorders. This work introduces a novel methodology that utilises a SE-Resnet-Bi-LSTM architecture to classify sleep into five separate stages. The classification process is based on the analysis of single-channel electroencephalograms (EEGs). The framework that has been suggested consists of two fundamental elements: a feature extractor that utilises SE-ResNet, and a temporal context encoder that use stacks of Bi-LSTM units.The effectiveness of our approach is substantiated by thorough assessments conducted on three different datasets, namely SLeepEDF-20, SleepEDF-78, and SHHS. Significantly, our methodology attains notable levels of accuracy, specifically 87.5\%, 83.9\%, and 87.8\%, along with macro-F1 scores of 82.5, 78.9, and 81.9 for the corresponding datasets. Notably, we introduce the utilization of 1D-GradCAM visualization to shed light on the decision-making process of our model in the realm of sleep stage classification. This visualization method not only provides valuable insights into the model's classification rationale but also aligns its outcomes with the annotations made by sleep experts. One notable feature of our research is the integration of an expedited training approach, which effectively preserves the model's resilience in terms of performance. The experimental evaluations conducted provide a comprehensive evaluation of the effectiveness of our proposed model in comparison to existing approaches, highlighting its potential for practical applications.
    Direct Text to Speech Translation System using Acoustic Units. (arXiv:2309.07478v1 [cs.CL])
    This paper proposes a direct text to speech translation system using discrete acoustic units. This framework employs text in different source languages as input to generate speech in the target language without the need for text transcriptions in this language. Motivated by the success of acoustic units in previous works for direct speech to speech translation systems, we use the same pipeline to extract the acoustic units using a speech encoder combined with a clustering algorithm. Once units are obtained, an encoder-decoder architecture is trained to predict them. Then a vocoder generates speech from units. Our approach for direct text to speech translation was tested on the new CVSS corpus with two different text mBART models employed as initialisation. The systems presented report competitive performance for most of the language pairs evaluated. Besides, results show a remarkable improvement when initialising our proposed architecture with a model pre-trained with more languages.
    Generating Parametric BRDFs from Natural Language Descriptions. (arXiv:2306.15679v2 [cs.GR] UPDATED)
    Artistic authoring of 3D environments is a laborious enterprise that also requires skilled content creators. There have been impressive improvements in using machine learning to address different aspects of generating 3D content, such as generating meshes, arranging geometry, synthesizing textures, etc. In this paper we develop a model to generate Bidirectional Reflectance Distribution Functions (BRDFs) from descriptive textual prompts. BRDFs are four dimensional probability distributions that characterize the interaction of light with surface materials. They are either represented parametrically, or by tabulating the probability density associated with every pair of incident and outgoing angles. The former lends itself to artistic editing while the latter is used when measuring the appearance of real materials. Numerous works have focused on hypothesizing BRDF models from images of materials. We learn a mapping from textual descriptions of materials to parametric BRDFs. Our model is first trained using a semi-supervised approach before being tuned via an unsupervised scheme. Although our model is general, in this paper we specifically generate parameters for MDL materials, conditioned on natural language descriptions, within NVIDIA's Omniverse platform. This enables use cases such as real-time text prompts to change materials of objects in 3D environments such as "dull plastic" or "shiny iron". Since the output of our model is a parametric BRDF, rather than an image of the material, it may be used to render materials using any shape under arbitrarily specified viewing and lighting conditions.
    Model Reprogramming: Resource-Efficient Cross-Domain Machine Learning. (arXiv:2202.10629v3 [cs.LG] UPDATED)
    In data-rich domains such as vision, language, and speech, deep learning prevails to deliver high-performance task-specific models and can even learn general task-agnostic representations for efficient finetuning to downstream tasks. However, deep learning in resource-limited domains still faces multiple challenges including (i) limited data, (ii) constrained model development cost, and (iii) lack of adequate pre-trained models for effective finetuning. This paper provides an overview of model reprogramming to bridge this gap. Model reprogramming enables resource-efficient cross-domain machine learning by repurposing and reusing a well-developed pre-trained model from a source domain to solve tasks in a target domain without model finetuning, where the source and target domains can be vastly different. In many applications, model reprogramming outperforms transfer learning and training from scratch. This paper elucidates the methodology of model reprogramming, summarizes existing use cases, provides a theoretical explanation of the success of model reprogramming, and concludes with a discussion on open-ended research questions and opportunities. A list of model reprogramming studies is actively maintained and updated at https://github.com/IBM/model-reprogramming.
    A Robust SINDy Approach by Combining Neural Networks and an Integral Form. (arXiv:2309.07193v1 [math.DS])
    The discovery of governing equations from data has been an active field of research for decades. One widely used methodology for this purpose is sparse regression for nonlinear dynamics, known as SINDy. Despite several attempts, noisy and scarce data still pose a severe challenge to the success of the SINDy approach. In this work, we discuss a robust method to discover nonlinear governing equations from noisy and scarce data. To do this, we make use of neural networks to learn an implicit representation based on measurement data so that not only it produces the output in the vicinity of the measurements but also the time-evolution of output can be described by a dynamical system. Additionally, we learn such a dynamic system in the spirit of the SINDy framework. Leveraging the implicit representation using neural networks, we obtain the derivative information -- required for SINDy -- using an automatic differentiation tool. To enhance the robustness of our methodology, we further incorporate an integral condition on the output of the implicit networks. Furthermore, we extend our methodology to handle data collected from multiple initial conditions. We demonstrate the efficiency of the proposed methodology to discover governing equations under noisy and scarce data regimes by means of several examples and compare its performance with existing methods.
    SpikeCP: Delay-Adaptive Reliable Spiking Neural Networks via Conformal Prediction. (arXiv:2305.11322v3 [cs.NE] UPDATED)
    Spiking neural networks (SNNs) process time-series data via internal event-driven neural dynamics whose energy consumption depends on the number of spikes exchanged between neurons over the course of the input presentation. In typical implementations of an SNN classifier, decisions are produced after the entire input sequence has been processed, resulting in latency and energy consumption levels that are fairly uniform across inputs. Recently introduced delay-adaptive SNNs tailor the inference latency -- and, with it, the energy consumption -- to the difficulty of each example, by producing an early decision when the SNN model is sufficiently ``confident''. In this paper, we start by observing that, as an SNN processes input samples, its classification decisions tend to be first under-confident and then over-confident with respect to the decision's ground-truth, unknown, test accuracy. This makes it difficult to determine a stopping time that ensures a desired level of accuracy. To address this problem, we introduce a novel delay-adaptive SNN-based inference methodology that, wrapping around any pre-trained SNN classifier, provides guaranteed reliability for the decisions produced at input-dependent stopping times. The approach entails minimal added complexity as compared to the underlying SNN, requiring only thresholding and counting operations at run time, and it leverages tools from conformal prediction (CP).
    Tree of Uncertain Thoughts Reasoning for Large Language Models. (arXiv:2309.07694v1 [cs.CL])
    While the recently introduced Tree of Thoughts (ToT) has heralded advancements in allowing Large Language Models (LLMs) to reason through foresight and backtracking for global decision-making, it has overlooked the inherent local uncertainties in intermediate decision points or "thoughts". These local uncertainties, intrinsic to LLMs given their potential for diverse responses, remain a significant concern in the reasoning process. Addressing this pivotal gap, we introduce the Tree of Uncertain Thoughts (TouT) - a reasoning framework tailored for LLMs. Our TouT effectively leverages Monte Carlo Dropout to quantify uncertainty scores associated with LLMs' diverse local responses at these intermediate steps. By marrying this local uncertainty quantification with global search algorithms, TouT enhances the model's precision in response generation. We substantiate our approach with rigorous experiments on two demanding planning tasks: Game of 24 and Mini Crosswords. The empirical evidence underscores TouT's superiority over both ToT and chain-of-thought prompting methods.
    Equivariant Data Augmentation for Generalization in Offline Reinforcement Learning. (arXiv:2309.07578v1 [cs.LG])
    We present a novel approach to address the challenge of generalization in offline reinforcement learning (RL), where the agent learns from a fixed dataset without any additional interaction with the environment. Specifically, we aim to improve the agent's ability to generalize to out-of-distribution goals. To achieve this, we propose to learn a dynamics model and check if it is equivariant with respect to a fixed type of transformation, namely translations in the state space. We then use an entropy regularizer to increase the equivariant set and augment the dataset with the resulting transformed samples. Finally, we learn a new policy offline based on the augmented dataset, with an off-the-shelf offline RL algorithm. Our experimental results demonstrate that our approach can greatly improve the test performance of the policy on the considered environments.
    BAARD: Blocking Adversarial Examples by Testing for Applicability, Reliability and Decidability. (arXiv:2105.00495v2 [cs.LG] UPDATED)
    Adversarial defenses protect machine learning models from adversarial attacks, but are often tailored to one type of model or attack. The lack of information on unknown potential attacks makes detecting adversarial examples challenging. Additionally, attackers do not need to follow the rules made by the defender. To address this problem, we take inspiration from the concept of Applicability Domain in cheminformatics. Cheminformatics models struggle to make accurate predictions because only a limited number of compounds are known and available for training. Applicability Domain defines a domain based on the known compounds and rejects any unknown compound that falls outside the domain. Similarly, adversarial examples start as harmless inputs, but can be manipulated to evade reliable classification by moving outside the domain of the classifier. We are the first to identify the similarity between Applicability Domain and adversarial detection. Instead of focusing on unknown attacks, we focus on what is known, the training data. We propose a simple yet robust triple-stage data-driven framework that checks the input globally and locally, and confirms that they are coherent with the model's output. This framework can be applied to any classification model and is not limited to specific attacks. We demonstrate these three stages work as one unit, effectively detecting various attacks, even for a white-box scenario.
    Joint Community Detection and Rotational Synchronization via Semidefinite Programming. (arXiv:2105.06031v2 [stat.ML] UPDATED)
    In the presence of heterogeneous data, where randomly rotated objects fall into multiple underlying categories, it is challenging to simultaneously classify them into clusters and synchronize them based on pairwise relations. This gives rise to the joint problem of community detection and synchronization. We propose a series of semidefinite relaxations, and prove their exact recovery when extending the celebrated stochastic block model to this new setting where both rotations and cluster identities are to be determined. Numerical experiments demonstrate the efficacy of our proposed algorithms and confirm our theoretical result which indicates a sharp phase transition for exact recovery.
    A Latent Space Theory for Emergent Abilities in Large Language Models. (arXiv:2304.09960v3 [cs.CL] UPDATED)
    Languages are not created randomly but rather to communicate information. There is a strong association between languages and their underlying meanings, resulting in a sparse joint distribution that is heavily peaked according to their correlations. Moreover, these peak values happen to match with the marginal distribution of languages due to the sparsity. With the advent of LLMs trained on big data and large models, we can now precisely assess the marginal distribution of languages, providing a convenient means of exploring the sparse structures in the joint distribution for effective inferences. In this paper, we categorize languages as either unambiguous or {\epsilon}-ambiguous and present quantitative results to demonstrate that the emergent abilities of LLMs, such as language understanding, in-context learning, chain-of-thought prompting, and effective instruction fine-tuning, can all be attributed to Bayesian inference on the sparse joint distribution of languages.
    Federated Dataset Dictionary Learning for Multi-Source Domain Adaptation. (arXiv:2309.07670v1 [cs.LG])
    In this article, we propose an approach for federated domain adaptation, a setting where distributional shift exists among clients and some have unlabeled data. The proposed framework, FedDaDiL, tackles the resulting challenge through dictionary learning of empirical distributions. In our setting, clients' distributions represent particular domains, and FedDaDiL collectively trains a federated dictionary of empirical distributions. In particular, we build upon the Dataset Dictionary Learning framework by designing collaborative communication protocols and aggregation operations. The chosen protocols keep clients' data private, thus enhancing overall privacy compared to its centralized counterpart. We empirically demonstrate that our approach successfully generates labeled data on the target domain with extensive experiments on (i) Caltech-Office, (ii) TEP, and (iii) CWRU benchmarks. Furthermore, we compare our method to its centralized counterpart and other benchmarks in federated domain adaptation.
    Scalable Bayesian optimization with high-dimensional outputs using randomized prior networks. (arXiv:2302.07260v5 [cs.LG] UPDATED)
    Several fundamental problems in science and engineering consist of global optimization tasks involving unknown high-dimensional (black-box) functions that map a set of controllable variables to the outcomes of an expensive experiment. Bayesian Optimization (BO) techniques are known to be effective in tackling global optimization problems using a relatively small number objective function evaluations, but their performance suffers when dealing with high-dimensional outputs. To overcome the major challenge of dimensionality, here we propose a deep learning framework for BO and sequential decision making based on bootstrapped ensembles of neural architectures with randomized priors. Using appropriate architecture choices, we show that the proposed framework can approximate functional relationships between design variables and quantities of interest, even in cases where the latter take values in high-dimensional vector spaces or even infinite-dimensional function spaces. In the context of BO, we augmented the proposed probabilistic surrogates with re-parameterized Monte Carlo approximations of multiple-point (parallel) acquisition functions, as well as methodological extensions for accommodating black-box constraints and multi-fidelity information sources. We test the proposed framework against state-of-the-art methods for BO and demonstrate superior performance across several challenging tasks with high-dimensional outputs, including a constrained multi-fidelity optimization task involving shape optimization of rotor blades in turbo-machinery.
    On the complexity of finding a local minimizer of a quadratic function over a polytope. (arXiv:2008.05558v5 [math.OC] UPDATED)
    We show that unless P=NP, there cannot be a polynomial-time algorithm that finds a point within Euclidean distance $c^n$ (for any constant $c \ge 0$) of a local minimizer of an $n$-variate quadratic function over a polytope. This result (even with $c=0$) answers a question of Pardalos and Vavasis that appeared in 1992 on a list of seven open problems in complexity theory for numerical optimization. Our proof technique also implies that the problem of deciding whether a quadratic function has a local minimizer over an (unbounded) polyhedron, and that of deciding if a quartic polynomial has a local minimizer are NP-hard.
    Physics-constrained robust learning of open-form PDEs from limited and noisy data. (arXiv:2309.07672v1 [cs.LG])
    Unveiling the underlying governing equations of nonlinear dynamic systems remains a significant challenge, especially when encountering noisy observations and no prior knowledge available. This study proposes R-DISCOVER, a framework designed to robustly uncover open-form partial differential equations (PDEs) from limited and noisy data. The framework operates through two alternating update processes: discovering and embedding. The discovering phase employs symbolic representation and a reinforcement learning (RL)-guided hybrid PDE generator to efficiently produce diverse open-form PDEs with tree structures. A neural network-based predictive model fits the system response and serves as the reward evaluator for the generated PDEs. PDEs with superior fits are utilized to iteratively optimize the generator via the RL method and the best-performing PDE is selected by a parameter-free stability metric. The embedding phase integrates the initially identified PDE from the discovering process as a physical constraint into the predictive model for robust training. The traversal of PDE trees automates the construction of the computational graph and the embedding process without human intervention. Numerical experiments demonstrate our framework's capability to uncover governing equations from nonlinear dynamic systems with limited and highly noisy data and outperform other physics-informed neural network-based discovery methods. This work opens new potential for exploring real-world systems with limited understanding.
    TrojViT: Trojan Insertion in Vision Transformers. (arXiv:2208.13049v4 [cs.LG] UPDATED)
    Vision Transformers (ViTs) have demonstrated the state-of-the-art performance in various vision-related tasks. The success of ViTs motivates adversaries to perform backdoor attacks on ViTs. Although the vulnerability of traditional CNNs to backdoor attacks is well-known, backdoor attacks on ViTs are seldom-studied. Compared to CNNs capturing pixel-wise local features by convolutions, ViTs extract global context information through patches and attentions. Na\"ively transplanting CNN-specific backdoor attacks to ViTs yields only a low clean data accuracy and a low attack success rate. In this paper, we propose a stealth and practical ViT-specific backdoor attack $TrojViT$. Rather than an area-wise trigger used by CNN-specific backdoor attacks, TrojViT generates a patch-wise trigger designed to build a Trojan composed of some vulnerable bits on the parameters of a ViT stored in DRAM memory through patch salience ranking and attention-target loss. TrojViT further uses minimum-tuned parameter update to reduce the bit number of the Trojan. Once the attacker inserts the Trojan into the ViT model by flipping the vulnerable bits, the ViT model still produces normal inference accuracy with benign inputs. But when the attacker embeds a trigger into an input, the ViT model is forced to classify the input to a predefined target class. We show that flipping only few vulnerable bits identified by TrojViT on a ViT model using the well-known RowHammer can transform the model into a backdoored one. We perform extensive experiments of multiple datasets on various ViT models. TrojViT can classify $99.64\%$ of test images to a target class by flipping $345$ bits on a ViT for ImageNet.Our codes are available at https://github.com/mxzheng/TrojViT
    Safe and Accelerated Deep Reinforcement Learning-based O-RAN Slicing: A Hybrid Transfer Learning Approach. (arXiv:2309.07265v1 [cs.NI])
    The open radio access network (O-RAN) architecture supports intelligent network control algorithms as one of its core capabilities. Data-driven applications incorporate such algorithms to optimize radio access network (RAN) functions via RAN intelligent controllers (RICs). Deep reinforcement learning (DRL) algorithms are among the main approaches adopted in the O-RAN literature to solve dynamic radio resource management problems. However, despite the benefits introduced by the O-RAN RICs, the practical adoption of DRL algorithms in real network deployments falls behind. This is primarily due to the slow convergence and unstable performance exhibited by DRL agents upon deployment and when facing previously unseen network conditions. In this paper, we address these challenges by proposing transfer learning (TL) as a core component of the training and deployment workflows for the DRL-based closed-loop control of O-RAN functionalities. To this end, we propose and design a hybrid TL-aided approach that leverages the advantages of both policy reuse and distillation TL methods to provide safe and accelerated convergence in DRL-based O-RAN slicing. We conduct a thorough experiment that accommodates multiple services, including real VR gaming traffic to reflect practical scenarios of O-RAN slicing. We also propose and implement policy reuse and distillation-aided DRL and non-TL-aided DRL as three separate baselines. The proposed hybrid approach shows at least: 7.7% and 20.7% improvements in the average initial reward value and the percentage of converged scenarios, and a 64.6% decrease in reward variance while maintaining fast convergence and enhancing the generalizability compared with the baselines.
    Self-Supervised Blind Source Separation via Multi-Encoder Autoencoders. (arXiv:2309.07138v1 [eess.SP])
    The task of blind source separation (BSS) involves separating sources from a mixture without prior knowledge of the sources or the mixing system. This is a challenging problem that often requires making restrictive assumptions about both the mixing system and the sources. In this paper, we propose a novel method for addressing BSS of non-linear mixtures by leveraging the natural feature subspace specialization ability of multi-encoder autoencoders with fully self-supervised learning without strong priors. During the training phase, our method unmixes the input into the separate encoding spaces of the multi-encoder network and then remixes these representations within the decoder for a reconstruction of the input. Then to perform source inference, we introduce a novel encoding masking technique whereby masking out all but one of the encodings enables the decoder to estimate a source signal. To this end, we also introduce a so-called pathway separation loss that encourages sparsity between the unmixed encoding spaces throughout the decoder's layers and a so-called zero reconstruction loss on the decoder for coherent source estimations. In order to carefully evaluate our method, we conduct experiments on a toy dataset and with real-world biosignal recordings from a polysomnography sleep study for extracting respiration.
    Dataset Size Dependence of Rate-Distortion Curve and Threshold of Posterior Collapse in Linear VAE. (arXiv:2309.07663v1 [stat.ML])
    In the Variational Autoencoder (VAE), the variational posterior often aligns closely with the prior, which is known as posterior collapse and hinders the quality of representation learning. To mitigate this problem, an adjustable hyperparameter beta has been introduced in the VAE. This paper presents a closed-form expression to assess the relationship between the beta in VAE, the dataset size, the posterior collapse, and the rate-distortion curve by analyzing a minimal VAE in a high-dimensional limit. These results clarify that a long plateau in the generalization error emerges with a relatively larger beta. As the beta increases, the length of the plateau extends and then becomes infinite beyond a certain beta threshold. This implies that the choice of beta, unlike the usual regularization parameters, can induce posterior collapse regardless of the dataset size. Thus, beta is a risky parameter that requires careful tuning. Furthermore, considering the dataset-size dependence on the rate-distortion curve, a relatively large dataset is required to obtain a rate-distortion curve with high rates. Extensive numerical experiments support our analysis.
    Efficient quantum recurrent reinforcement learning via quantum reservoir computing. (arXiv:2309.07339v1 [quant-ph])
    Quantum reinforcement learning (QRL) has emerged as a framework to solve sequential decision-making tasks, showcasing empirical quantum advantages. A notable development is through quantum recurrent neural networks (QRNNs) for memory-intensive tasks such as partially observable environments. However, QRL models incorporating QRNN encounter challenges such as inefficient training of QRL with QRNN, given that the computation of gradients in QRNN is both computationally expensive and time-consuming. This work presents a novel approach to address this challenge by constructing QRL agents utilizing QRNN-based reservoirs, specifically employing quantum long short-term memory (QLSTM). QLSTM parameters are randomly initialized and fixed without training. The model is trained using the asynchronous advantage actor-aritic (A3C) algorithm. Through numerical simulations, we validate the efficacy of our QLSTM-Reservoir RL framework. Its performance is assessed on standard benchmarks, demonstrating comparable results to a fully trained QLSTM RL model with identical architecture and training settings.
    Speeding up Learning Quantum States through Group Equivariant Convolutional Quantum Ans\"atze. (arXiv:2112.07611v3 [quant-ph] UPDATED)
    We develop a theoretical framework for $S_n$-equivariant convolutional quantum circuits with SU$(d)$-symmetry, building on and significantly generalizing Jordan's Permutational Quantum Computing (PQC) formalism based on Schur-Weyl duality connecting both SU$(d)$ and $S_n$ actions on qudits. In particular, we utilize the Okounkov-Vershik approach to prove Harrow's statement (Ph.D. Thesis 2005 p.160) on the equivalence between $\operatorname{SU}(d)$ and $S_n$ irrep bases and to establish the $S_n$-equivariant Convolutional Quantum Alternating Ans\"atze ($S_n$-CQA) using Young-Jucys-Murphy (YJM) elements. We prove that $S_n$-CQA is able to generate any unitary in any given $S_n$ irrep sector, which may serve as a universal model for a wide array of quantum machine learning problems with the presence of SU($d$) symmetry. Our method provides another way to prove the universality of Quantum Approximate Optimization Algorithm (QAOA) and verifies that 4-local SU($d$) symmetric unitaries are sufficient to build generic SU($d$) symmetric quantum circuits up to relative phase factors. We present numerical simulations to showcase the effectiveness of the ans\"atze to find the ground state energy of the $J_1$--$J_2$ antiferromagnetic Heisenberg model on the rectangular and Kagome lattices. Our work provides the first application of the celebrated Okounkov-Vershik's $S_n$ representation theory to quantum physics and machine learning, from which to propose quantum variational ans\"atze that strongly suggests to be classically intractable tailored towards a specific optimization problem.
    Proximal Bellman mappings for reinforcement learning and their application to robust adaptive filtering. (arXiv:2309.07548v1 [eess.SP])
    This paper aims at the algorithmic/theoretical core of reinforcement learning (RL) by introducing the novel class of proximal Bellman mappings. These mappings are defined in reproducing kernel Hilbert spaces (RKHSs), to benefit from the rich approximation properties and inner product of RKHSs, they are shown to belong to the powerful Hilbertian family of (firmly) nonexpansive mappings, regardless of the values of their discount factors, and possess ample degrees of design freedom to even reproduce attributes of the classical Bellman mappings and to pave the way for novel RL designs. An approximate policy-iteration scheme is built on the proposed class of mappings to solve the problem of selecting online, at every time instance, the "optimal" exponent $p$ in a $p$-norm loss to combat outliers in linear adaptive filtering, without training data and any knowledge on the statistical properties of the outliers. Numerical tests on synthetic data showcase the superior performance of the proposed framework over several non-RL and kernel-based RL schemes.
    Efficient Learning of PDEs via Taylor Expansion and Sparse Decomposition into Value and Fourier Domains. (arXiv:2309.07344v1 [cs.LG])
    Accelerating the learning of Partial Differential Equations (PDEs) from experimental data will speed up the pace of scientific discovery. Previous randomized algorithms exploit sparsity in PDE updates for acceleration. However such methods are applicable to a limited class of decomposable PDEs, which have sparse features in the value domain. We propose Reel, which accelerates the learning of PDEs via random projection and has much broader applicability. Reel exploits the sparsity by decomposing dense updates into sparse ones in both the value and frequency domains. This decomposition enables efficient learning when the source of the updates consists of gradually changing terms across large areas (sparse in the frequency domain) in addition to a few rapid updates concentrated in a small set of "interfacial" regions (sparse in the value domain). Random projection is then applied to compress the sparse signals for learning. To expand the model applicability, Taylor series expansion is used in Reel to approximate the nonlinear PDE updates with polynomials in the decomposable form. Theoretically, we derive a constant factor approximation between the projected loss function and the original one with poly-logarithmic number of projected dimensions. Experimentally, we provide empirical evidence that our proposed Reel can lead to faster learning of PDE models (70-98% reduction in training time when the data is compressed to 1% of its original size) with comparable quality as the non-compressed models.
    Is Solving Graph Neural Tangent Kernel Equivalent to Training Graph Neural Network?. (arXiv:2309.07452v1 [cs.LG])
    A rising trend in theoretical deep learning is to understand why deep learning works through Neural Tangent Kernel (NTK) [jgh18], a kernel method that is equivalent to using gradient descent to train a multi-layer infinitely-wide neural network. NTK is a major step forward in the theoretical deep learning because it allows researchers to use traditional mathematical tools to analyze properties of deep neural networks and to explain various neural network techniques from a theoretical view. A natural extension of NTK on graph learning is \textit{Graph Neural Tangent Kernel (GNTK)}, and researchers have already provide GNTK formulation for graph-level regression and show empirically that this kernel method can achieve similar accuracy as GNNs on various bioinformatics datasets [dhs+19]. The remaining question now is whether solving GNTK regression is equivalent to training an infinite-wide multi-layer GNN using gradient descent. In this paper, we provide three new theoretical results. First, we formally prove this equivalence for graph-level regression. Second, we present the first GNTK formulation for node-level regression. Finally, we prove the equivalence for node-level regression.
    Rates of Convergence in Certain Native Spaces of Approximations used in Reinforcement Learning. (arXiv:2309.07383v1 [eess.SY])
    This paper studies convergence rates for some value function approximations that arise in a collection of reproducing kernel Hilbert spaces (RKHS) $H(\Omega)$. By casting an optimal control problem in a specific class of native spaces, strong rates of convergence are derived for the operator equation that enables offline approximations that appear in policy iteration. Explicit upper bounds on error in value function approximations are derived in terms of power function $\Pwr_{H,N}$ for the space of finite dimensional approximants $H_N$ in the native space $H(\Omega)$. These bounds are geometric in nature and refine some well-known, now classical results concerning convergence of approximations of value functions.
    GBE-MLZSL: A Group Bi-Enhancement Framework for Multi-Label Zero-Shot Learning. (arXiv:2309.00923v2 [cs.CV] UPDATED)
    This paper investigates a challenging problem of zero-shot learning in the multi-label scenario (MLZSL), wherein, the model is trained to recognize multiple unseen classes within a sample (e.g., an image) based on seen classes and auxiliary knowledge, e.g., semantic information. Existing methods usually resort to analyzing the relationship of various seen classes residing in a sample from the dimension of spatial or semantic characteristics, and transfer the learned model to unseen ones. But they ignore the effective integration of local and global features. That is, in the process of inferring unseen classes, global features represent the principal direction of the image in the feature space, while local features should maintain uniqueness within a certain range. This integrated neglect will make the model lose its grasp of the main components of the image. Relying only on the local existence of seen classes during the inference stage introduces unavoidable bias. In this paper, we propose a novel and effective group bi-enhancement framework for MLZSL, dubbed GBE-MLZSL, to fully make use of such properties and enable a more accurate and robust visual-semantic projection. Specifically, we split the feature maps into several feature groups, of which each feature group can be trained independently with the Local Information Distinguishing Module (LID) to ensure uniqueness. Meanwhile, a Global Enhancement Module (GEM) is designed to preserve the principal direction. Besides, a static graph structure is designed to construct the correlation of local features. Experiments on large-scale MLZSL benchmark datasets NUS-WIDE and Open-Images-v4 demonstrate that the proposed GBE-MLZSL outperforms other state-of-the-art methods with large margins.
    ETP: Learning Transferable ECG Representations via ECG-Text Pre-training. (arXiv:2309.07145v1 [eess.SP])
    In the domain of cardiovascular healthcare, the Electrocardiogram (ECG) serves as a critical, non-invasive diagnostic tool. Although recent strides in self-supervised learning (SSL) have been promising for ECG representation learning, these techniques often require annotated samples and struggle with classes not present in the fine-tuning stages. To address these limitations, we introduce ECG-Text Pre-training (ETP), an innovative framework designed to learn cross-modal representations that link ECG signals with textual reports. For the first time, this framework leverages the zero-shot classification task in the ECG domain. ETP employs an ECG encoder along with a pre-trained language model to align ECG signals with their corresponding textual reports. The proposed framework excels in both linear evaluation and zero-shot classification tasks, as demonstrated on the PTB-XL and CPSC2018 datasets, showcasing its ability for robust and generalizable cross-modal ECG feature learning.
    Bringing PDEs to JAX with forward and reverse modes automatic differentiation. (arXiv:2309.07137v1 [cs.MS])
    Partial differential equations (PDEs) are used to describe a variety of physical phenomena. Often these equations do not have analytical solutions and numerical approximations are used instead. One of the common methods to solve PDEs is the finite element method. Computing derivative information of the solution with respect to the input parameters is important in many tasks in scientific computing. We extend JAX automatic differentiation library with an interface to Firedrake finite element library. High-level symbolic representation of PDEs allows bypassing differentiating through low-level possibly many iterations of the underlying nonlinear solvers. Differentiating through Firedrake solvers is done using tangent-linear and adjoint equations. This enables the efficient composition of finite element solvers with arbitrary differentiable programs. The code is available at github.com/IvanYashchuk/jax-firedrake.
    Gibbs-Duhem-Informed Neural Networks for Binary Activity Coefficient Prediction. (arXiv:2306.07937v2 [physics.chem-ph] UPDATED)
    We propose Gibbs-Duhem-informed neural networks for the prediction of binary activity coefficients at varying compositions. That is, we include the Gibbs-Duhem equation explicitly in the loss function for training neural networks, which is straightforward in standard machine learning (ML) frameworks enabling automatic differentiation. In contrast to recent hybrid ML approaches, our approach does not rely on embedding a specific thermodynamic model inside the neural network and corresponding prediction limitations. Rather, Gibbs-Duhem consistency serves as regularization, with the flexibility of ML models being preserved. Our results show increased thermodynamic consistency and generalization capabilities for activity coefficient predictions by Gibbs-Duhem-informed graph neural networks and matrix completion methods. We also find that the model architecture, particularly the activation function, can have a strong influence on the prediction quality. The approach can be easily extended to account for other thermodynamic consistency conditions.
    MELAGE: A purely python based Neuroimaging software (Neonatal). (arXiv:2309.07175v1 [eess.IV])
    MELAGE, a pioneering Python-based neuroimaging software, emerges as a versatile tool for the visualization, processing, and analysis of medical images. Initially conceived to address the unique challenges of processing 3D ultrasound and MRI brain images during the neonatal period, MELAGE exhibits remarkable adaptability, extending its utility to the domain of adult human brain imaging. At its core, MELAGE features a semi-automatic brain extraction tool empowered by a deep learning module, ensuring precise and efficient brain structure extraction from MRI and 3D Ultrasound data. Moreover, MELAGE offers a comprehensive suite of features, encompassing dynamic 3D visualization, accurate measurements, and interactive image segmentation. This transformative software holds immense promise for researchers and clinicians, offering streamlined image analysis, seamless integration with deep learning algorithms, and broad applicability in the realm of medical imaging.
    EmbodiedGPT: Vision-Language Pre-Training via Embodied Chain of Thought. (arXiv:2305.15021v2 [cs.RO] UPDATED)
    Embodied AI is a crucial frontier in robotics, capable of planning and executing action sequences for robots to accomplish long-horizon tasks in physical environments. In this work, we introduce EmbodiedGPT, an end-to-end multi-modal foundation model for embodied AI, empowering embodied agents with multi-modal understanding and execution capabilities. To achieve this, we have made the following efforts: (i) We craft a large-scale embodied planning dataset, termed EgoCOT. The dataset consists of carefully selected videos from the Ego4D dataset, along with corresponding high-quality language instructions. Specifically, we generate a sequence of sub-goals with the "Chain of Thoughts" mode for effective embodied planning. (ii) We introduce an efficient training approach to EmbodiedGPT for high-quality plan generation, by adapting a 7B large language model (LLM) to the EgoCOT dataset via prefix tuning. (iii) We introduce a paradigm for extracting task-related features from LLM-generated planning queries to form a closed loop between high-level planning and low-level control. Extensive experiments show the effectiveness of EmbodiedGPT on embodied tasks, including embodied planning, embodied control, visual captioning, and visual question answering. Notably, EmbodiedGPT significantly enhances the success rate of the embodied control task by extracting more effective features. It has achieved a remarkable 1.6 times increase in success rate on the Franka Kitchen benchmark and a 1.3 times increase on the Meta-World benchmark, compared to the BLIP-2 baseline fine-tuned with the Ego4D dataset.
    EpiDeNet: An Energy-Efficient Approach to Seizure Detection for Embedded Systems. (arXiv:2309.07135v1 [eess.SP])
    Epilepsy is a prevalent neurological disorder that affects millions of individuals globally, and continuous monitoring coupled with automated seizure detection appears as a necessity for effective patient treatment. To enable long-term care in daily-life conditions, comfortable and smart wearable devices with long battery life are required, which in turn set the demand for resource-constrained and energy-efficient computing solutions. In this context, the development of machine learning algorithms for seizure detection faces the challenge of heavily imbalanced datasets. This paper introduces EpiDeNet, a new lightweight seizure detection network, and Sensitivity-Specificity Weighted Cross-Entropy (SSWCE), a new loss function that incorporates sensitivity and specificity, to address the challenge of heavily unbalanced datasets. The proposed EpiDeNet-SSWCE approach demonstrates the successful detection of 91.16% and 92.00% seizure events on two different datasets (CHB-MIT and PEDESITE, respectively), with only four EEG channels. A three-window majority voting-based smoothing scheme combined with the SSWCE loss achieves 3x reduction of false positives to 1.18 FP/h. EpiDeNet is well suited for implementation on low-power embedded platforms, and we evaluate its performance on two ARM Cortex-based platforms (M4F/M7) and two parallel ultra-low power (PULP) systems (GAP8, GAP9). The most efficient implementation (GAP9) achieves an energy efficiency of 40 GMAC/s/W, with an energy consumption per inference of only 0.051 mJ at high performance (726.46 MMAC/s), outperforming the best ARM Cortex-based solutions by approximately 160x in energy efficiency. The EpiDeNet-SSWCE method demonstrates effective and accurate seizure detection performance on heavily imbalanced datasets, while being suited for implementation on energy-constrained platforms.
    Detecting Unknown Attacks in IoT Environments: An Open Set Classifier for Enhanced Network Intrusion Detection. (arXiv:2309.07461v1 [cs.CR])
    The widespread integration of Internet of Things (IoT) devices across all facets of life has ushered in an era of interconnectedness, creating new avenues for cybersecurity challenges and underscoring the need for robust intrusion detection systems. However, traditional security systems are designed with a closed-world perspective and often face challenges in dealing with the ever-evolving threat landscape, where new and unfamiliar attacks are constantly emerging. In this paper, we introduce a framework aimed at mitigating the open set recognition (OSR) problem in the realm of Network Intrusion Detection Systems (NIDS) tailored for IoT environments. Our framework capitalizes on image-based representations of packet-level data, extracting spatial and temporal patterns from network traffic. Additionally, we integrate stacking and sub-clustering techniques, enabling the identification of unknown attacks by effectively modeling the complex and diverse nature of benign behavior. The empirical results prominently underscore the framework's efficacy, boasting an impressive 88\% detection rate for previously unseen attacks when compared against existing approaches and recent advancements. Future work will perform extensive experimentation across various openness levels and attack scenarios, further strengthening the adaptability and performance of our proposed solution in safeguarding IoT environments.
    Choosing a Proxy Metric from Past Experiments. (arXiv:2309.07893v1 [stat.ME])
    In many randomized experiments, the treatment effect of the long-term metric (i.e. the primary outcome of interest) is often difficult or infeasible to measure. Such long-term metrics are often slow to react to changes and sufficiently noisy they are challenging to faithfully estimate in short-horizon experiments. A common alternative is to measure several short-term proxy metrics in the hope they closely track the long-term metric -- so they can be used to effectively guide decision-making in the near-term. We introduce a new statistical framework to both define and construct an optimal proxy metric for use in a homogeneous population of randomized experiments. Our procedure first reduces the construction of an optimal proxy metric in a given experiment to a portfolio optimization problem which depends on the true latent treatment effects and noise level of experiment under consideration. We then denoise the observed treatment effects of the long-term metric and a set of proxies in a historical corpus of randomized experiments to extract estimates of the latent treatment effects for use in the optimization problem. One key insight derived from our approach is that the optimal proxy metric for a given experiment is not apriori fixed; rather it should depend on the sample size (or effective noise level) of the randomized experiment for which it is deployed. To instantiate and evaluate our framework, we employ our methodology in a large corpus of randomized experiments from an industrial recommendation system and construct proxy metrics that perform favorably relative to several baselines.
    Attention-based Dynamic Graph Convolutional Recurrent Neural Network for Traffic Flow Prediction in Highway Transportation. (arXiv:2309.07196v1 [cs.LG])
    As one of the important tools for spatial feature extraction, graph convolution has been applied in a wide range of fields such as traffic flow prediction. However, current popular works of graph convolution cannot guarantee spatio-temporal consistency in a long period. The ignorance of correlational dynamics, convolutional locality and temporal comprehensiveness would limit predictive accuracy. In this paper, a novel Attention-based Dynamic Graph Convolutional Recurrent Neural Network (ADGCRNN) is proposed to improve traffic flow prediction in highway transportation. Three temporal resolutions of data sequence are effectively integrated by self-attention to extract characteristics; multi-dynamic graphs and their weights are dynamically created to compliantly combine the varying characteristics; a dedicated gated kernel emphasizing highly relative nodes is introduced on these complete graphs to reduce overfitting for graph convolution operations. Experiments on two public datasets show our work better than state-of-the-art baselines, and case studies of a real Web system prove practical benefit in highway transportation.
    User Training with Error Augmentation for Electromyogram-based Gesture Classification. (arXiv:2309.07289v1 [cs.HC])
    We designed and tested a system for real-time control of a user interface by extracting surface electromyographic (sEMG) activity from eight electrodes in a wrist-band configuration. sEMG data were streamed into a machine-learning algorithm that classified hand gestures in real-time. After an initial model calibration, participants were presented with one of three types of feedback during a human-learning stage: veridical feedback, in which predicted probabilities from the gesture classification algorithm were displayed without alteration, modified feedback, in which we applied a hidden augmentation of error to these probabilities, and no feedback. User performance was then evaluated in a series of minigames, in which subjects were required to use eight gestures to manipulate their game avatar to complete a task. Experimental results indicated that, relative to baseline, the modified feedback condition led to significantly improved accuracy and improved gesture class separation. These findings suggest that real-time feedback in a gamified user interface with manipulation of feedback may enable intuitive, rapid, and accurate task acquisition for sEMG-based gesture recognition applications.
    Optimal and Fair Encouragement Policy Evaluation and Learning. (arXiv:2309.07176v1 [cs.LG])
    In consequential domains, it is often impossible to compel individuals to take treatment, so that optimal policy rules are merely suggestions in the presence of human non-adherence to treatment recommendations. In these same domains, there may be heterogeneity both in who responds in taking-up treatment, and heterogeneity in treatment efficacy. While optimal treatment rules can maximize causal outcomes across the population, access parity constraints or other fairness considerations can be relevant in the case of encouragement. For example, in social services, a persistent puzzle is the gap in take-up of beneficial services among those who may benefit from them the most. When in addition the decision-maker has distributional preferences over both access and average outcomes, the optimal decision rule changes. We study causal identification, statistical variance-reduced estimation, and robust estimation of optimal treatment rules, including under potential violations of positivity. We consider fairness constraints such as demographic parity in treatment take-up, and other constraints, via constrained optimization. Our framework can be extended to handle algorithmic recommendations under an often-reasonable covariate-conditional exclusion restriction, using our robustness checks for lack of positivity in the recommendation. We develop a two-stage algorithm for solving over parametrized policy classes under general constraints to obtain variance-sensitive regret bounds. We illustrate the methods in two case studies based on data from randomized encouragement to enroll in insurance and from pretrial supervised release with electronic monitoring.
    Deep Spatiotemporal Clustering: A Temporal Clustering Approach for Multi-dimensional Climate Data. (arXiv:2304.14541v2 [cs.LG] UPDATED)
    Clustering high-dimensional spatiotemporal data using an unsupervised approach is a challenging problem for many data-driven applications. Existing state-of-the-art methods for unsupervised clustering use different similarity and distance functions but focus on either spatial or temporal features of the data. Concentrating on joint deep representation learning of spatial and temporal features, we propose Deep Spatiotemporal Clustering (DSC), a novel algorithm for the temporal clustering of high-dimensional spatiotemporal data using an unsupervised deep learning method. Inspired by the U-net architecture, DSC utilizes an autoencoder integrating CNN-RNN layers to learn latent representations of the spatiotemporal data. DSC also includes a unique layer for cluster assignment on latent representations that uses the Student's t-distribution. By optimizing the clustering loss and data reconstruction loss simultaneously, the algorithm gradually improves clustering assignments and the nonlinear mapping between low-dimensional latent feature space and high-dimensional original data space. A multivariate spatiotemporal climate dataset is used to evaluate the efficacy of the proposed method. Our extensive experiments show our approach outperforms both conventional and deep learning-based unsupervised clustering algorithms. Additionally, we compared the proposed model with its various variants (CNN encoder, CNN autoencoder, CNN-RNN encoder, CNN-RNN autoencoder, etc.) to get insight into using both the CNN and RNN layers in the autoencoder, and our proposed technique outperforms these variants in terms of clustering results.
    Short-term power load forecasting method based on CNN-SAEDN-Res. (arXiv:2309.07140v1 [eess.SP])
    In deep learning, the load data with non-temporal factors are difficult to process by sequence models. This problem results in insufficient precision of the prediction. Therefore, a short-term load forecasting method based on convolutional neural network (CNN), self-attention encoder-decoder network (SAEDN) and residual-refinement (Res) is proposed. In this method, feature extraction module is composed of a two-dimensional convolutional neural network, which is used to mine the local correlation between data and obtain high-dimensional data features. The initial load fore-casting module consists of a self-attention encoder-decoder network and a feedforward neural network (FFN). The module utilizes self-attention mechanisms to encode high-dimensional features. This operation can obtain the global correlation between data. Therefore, the model is able to retain important information based on the coupling relationship between the data in data mixed with non-time series factors. Then, self-attention decoding is per-formed and the feedforward neural network is used to regression initial load. This paper introduces the residual mechanism to build the load optimization module. The module generates residual load values to optimize the initial load. The simulation results show that the proposed load forecasting method has advantages in terms of prediction accuracy and prediction stability.  ( 2 min )
    TargetCall: Eliminating the Wasted Computation in Basecalling via Pre-Basecalling Filtering. (arXiv:2212.04953v2 [q-bio.GN] UPDATED)
    Basecalling is an essential step in nanopore sequencing analysis where the raw signals of nanopore sequencers are converted into nucleotide sequences, i.e., reads. State-of-the-art basecallers employ complex deep learning models to achieve high basecalling accuracy. This makes basecalling computationally-inefficient and memory-hungry; bottlenecking the entire genome analysis pipeline. However, for many applications, the majority of reads do no match the reference genome of interest (i.e., target reference) and thus are discarded in later steps in the genomics pipeline, wasting the basecalling computation. To overcome this issue, we propose TargetCall, the first pre-basecalling filter to eliminate the wasted computation in basecalling. TargetCall's key idea is to discard reads that will not match the target reference (i.e., off-target reads) prior to basecalling. TargetCall consists of two main components: (1) LightCall, a lightweight neural network basecaller that produces noisy reads; and (2) Similarity Check, which labels each of these noisy reads as on-target or off-target by matching them to the target reference. TargetCall aims to filter out all off-target reads before basecalling. The highly-accurate but slow basecalling is performed only on the raw signals whose noisy reads are labeled as on-target. Our thorough experimental evaluations using both real and simulated data show that TargetCall 1) improves the end-to-end basecalling performance while maintaining high sensitivity in keeping on-target reads, 2) maintains high accuracy in downstream analysis, 3) precisely filters out up to 94.71% of off-target reads, and 4) achieves better performance, throughput, sensitivity, precision, and generality compared to prior works. We open-source TargetCall at https://github.com/CMU-SAFARI/TargetCall  ( 3 min )
    Contrastive Tuning: A Little Help to Make Masked Autoencoders Forget. (arXiv:2304.10520v2 [cs.CV] UPDATED)
    Masked Image Modeling (MIM) methods, like Masked Autoencoders (MAE), efficiently learn a rich representation of the input. However, for adapting to downstream tasks, they require a sufficient amount of labeled data since their rich features code not only objects but also less relevant image background. In contrast, Instance Discrimination (ID) methods focus on objects. In this work, we study how to combine the efficiency and scalability of MIM with the ability of ID to perform downstream classification in the absence of large amounts of labeled data. To this end, we introduce Masked Autoencoder Contrastive Tuning (MAE-CT), a sequential approach that utilizes the implicit clustering of the Nearest Neighbor Contrastive Learning (NNCLR) objective to induce abstraction in the topmost layers of a pre-trained MAE. MAE-CT tunes the rich features such that they form semantic clusters of objects without using any labels. Notably, MAE-CT does not rely on hand-crafted augmentations and frequently achieves its best performances while using only minimal augmentations (crop & flip). Further, MAE-CT is compute efficient as it requires at most 10% overhead compared to MAE re-training. Applied to large and huge Vision Transformer (ViT) models, MAE-CT excels over previous self-supervised methods trained on ImageNet in linear probing, k-NN and low-shot classification accuracy as well as in unsupervised clustering accuracy. With ViT-H/16 MAE-CT achieves a new state-of-the-art in linear probing of 82.2%.  ( 3 min )
    All you need is spin: SU(2) equivariant variational quantum circuits based on spin networks. (arXiv:2309.07250v1 [quant-ph])
    Variational algorithms require architectures that naturally constrain the optimisation space to run efficiently. In geometric quantum machine learning, one achieves this by encoding group structure into parameterised quantum circuits to include the symmetries of a problem as an inductive bias. However, constructing such circuits is challenging as a concrete guiding principle has yet to emerge. In this paper, we propose the use of spin networks, a form of directed tensor network invariant under a group transformation, to devise SU(2) equivariant quantum circuit ans\"atze -- circuits possessing spin rotation symmetry. By changing to the basis that block diagonalises SU(2) group action, these networks provide a natural building block for constructing parameterised equivariant quantum circuits. We prove that our construction is mathematically equivalent to other known constructions, such as those based on twirling and generalised permutations, but more direct to implement on quantum hardware. The efficacy of our constructed circuits is tested by solving the ground state problem of SU(2) symmetric Heisenberg models on the one-dimensional triangular lattice and on the Kagome lattice. Our results highlight that our equivariant circuits boost the performance of quantum variational algorithms, indicating broader applicability to other real-world problems.  ( 2 min )
    TempEE: Temporal-Spatial Parallel Transformer for Radar Echo Extrapolation Beyond Auto-Regression. (arXiv:2304.14131v2 [eess.SP] UPDATED)
    Meteorological radar reflectivity data (i.e. radar echo) significantly influences precipitation prediction. It can facilitate accurate and expeditious forecasting of short-term heavy rainfall bypassing the need for complex Numerical Weather Prediction (NWP) models. In comparison to conventional models, Deep Learning (DL)-based radar echo extrapolation algorithms exhibit higher effectiveness and efficiency. Nevertheless, the development of reliable and generalized echo extrapolation algorithm is impeded by three primary challenges: cumulative error spreading, imprecise representation of sparsely distributed echoes, and inaccurate description of non-stationary motion processes. To tackle these challenges, this paper proposes a novel radar echo extrapolation algorithm called Temporal-Spatial Parallel Transformer, referred to as TempEE. TempEE avoids using auto-regression and instead employs a one-step forward strategy to prevent cumulative error spreading during the extrapolation process. Additionally, we propose the incorporation of a Multi-level Temporal-Spatial Attention mechanism to improve the algorithm's capability of capturing both global and local information while emphasizing task-related regions, including sparse echo representations, in an efficient manner. Furthermore, the algorithm extracts spatio-temporal representations from continuous echo images using a parallel encoder to model the non-stationary motion process for echo extrapolation. The superiority of our TempEE has been demonstrated in the context of the classic radar echo extrapolation task, utilizing a real-world dataset. Extensive experiments have further validated the efficacy and indispensability of various components within TempEE.  ( 3 min )
    Survival Estimation for Missing not at Random Censoring Indicators based on Copula Models. (arXiv:2009.01726v2 [stat.ML] UPDATED)
    In the presence of right-censored data with covariates, the conditional Kaplan-Meier estimator (also known as the Beran estimator) consistently estimates the conditional survival function of the random follow-up for the event of interest. However, a necessary condition is the unambiguous knowledge of whether each individual is censored or not, which may be incomplete in practice. We therefore propose a study of the Beran estimator when the censoring indicators are generic random variables and discuss necessary conditions for the efficiency of the Beran estimator. From this, we provide a new estimator for the conditional survival function with missing not at random (MNAR) censoring indicators based on a conditional copula model for the missingness mechanism. In addition to the theoretical results, we illustrate how the estimators work for small samples through a simulation study and show their practical applicability by analyzing synthetic and real data.  ( 2 min )
    Turning Dross Into Gold Loss: is BERT4Rec really better than SASRec?. (arXiv:2309.07602v1 [cs.IR])
    Recently sequential recommendations and next-item prediction task has become increasingly popular in the field of recommender systems. Currently, two state-of-the-art baselines are Transformer-based models SASRec and BERT4Rec. Over the past few years, there have been quite a few publications comparing these two algorithms and proposing new state-of-the-art models. In most of the publications, BERT4Rec achieves better performance than SASRec. But BERT4Rec uses cross-entropy over softmax for all items, while SASRec uses negative sampling and calculates binary cross-entropy loss for one positive and one negative item. In our work, we show that if both models are trained with the same loss, which is used by BERT4Rec, then SASRec will significantly outperform BERT4Rec both in terms of quality and training speed. In addition, we show that SASRec could be effectively trained with negative sampling and still outperform BERT4Rec, but the number of negative examples should be much larger than one.  ( 2 min )
    Solving Recurrence Relations using Machine Learning, with Application to Cost Analysis. (arXiv:2309.07259v1 [cs.PL])
    Automatic static cost analysis infers information about the resources used by programs without actually running them with concrete data, and presents such information as functions of input data sizes. Most of the analysis tools for logic programs (and other languages) are based on setting up recurrence relations representing (bounds on) the computational cost of predicates, and solving them to find closed-form functions that are equivalent to (or a bound on) them. Such recurrence solving is a bottleneck in current tools: many of the recurrences that arise during the analysis cannot be solved with current solvers, such as Computer Algebra Systems (CASs), so that specific methods for different classes of recurrences need to be developed. We address such a challenge by developing a novel, general approach for solving arbitrary, constrained recurrence relations, that uses machine-learning sparse regression techniques to guess a candidate closed-form function, and a combination of an SMT-solver and a CAS to check whether such function is actually a solution of the recurrence. We have implemented a prototype and evaluated it with recurrences generated by a cost analysis system (the one in CiaoPP). The experimental results are quite promising, showing that our approach can find closed-form solutions, in a reasonable time, for classes of recurrences that cannot be solved by such a system, nor by current CASs.  ( 3 min )
    Communication Efficient Private Federated Learning Using Dithering. (arXiv:2309.07809v1 [cs.LG])
    The task of preserving privacy while ensuring efficient communication is a fundamental challenge in federated learning. In this work, we tackle this challenge in the trusted aggregator model, and propose a solution that achieves both objectives simultaneously. We show that employing a quantization scheme based on subtractive dithering at the clients can effectively replicate the normal noise addition process at the aggregator. This implies that we can guarantee the same level of differential privacy against other clients while substantially reducing the amount of communication required, as opposed to transmitting full precision gradients and using central noise addition. We also experimentally demonstrate that the accuracy of our proposed approach matches that of the full precision gradient method.  ( 2 min )
    Hodge-Aware Contrastive Learning. (arXiv:2309.07364v1 [cs.LG])
    Simplicial complexes prove effective in modeling data with multiway dependencies, such as data defined along the edges of networks or within other higher-order structures. Their spectrum can be decomposed into three interpretable subspaces via the Hodge decomposition, resulting foundational in numerous applications. We leverage this decomposition to develop a contrastive self-supervised learning approach for processing simplicial data and generating embeddings that encapsulate specific spectral information.Specifically, we encode the pertinent data invariances through simplicial neural networks and devise augmentations that yield positive contrastive examples with suitable spectral properties for downstream tasks. Additionally, we reweight the significance of negative examples in the contrastive loss, considering the similarity of their Hodge components to the anchor. By encouraging a stronger separation among less similar instances, we obtain an embedding space that reflects the spectral properties of the data. The numerical results on two standard edge flow classification tasks show a superior performance even when compared to supervised learning techniques. Our findings underscore the importance of adopting a spectral perspective for contrastive learning with higher-order data.  ( 2 min )
    Variational Quantum Linear Solver enhanced Quantum Support Vector Machine. (arXiv:2309.07770v1 [quant-ph])
    Quantum Support Vector Machines (QSVM) play a vital role in using quantum resources for supervised machine learning tasks, such as classification. However, current methods are strongly limited in terms of scalability on Noisy Intermediate Scale Quantum (NISQ) devices. In this work, we propose a novel approach called the Variational Quantum Linear Solver (VQLS) enhanced QSVM. This is built upon our idea of utilizing the variational quantum linear solver to solve system of linear equations of a least squares-SVM on a NISQ device. The implementation of our approach is evaluated by an extensive series of numerical experiments with the Iris dataset, which consists of three distinct iris plant species. Based on this, we explore the practicality and effectiveness of our algorithm by constructing a classifier capable of classification in a feature space ranging from one to seven dimensions. Furthermore, by strategically exploiting both classical and quantum computing for various subroutines of our algorithm, we effectively mitigate practical challenges associated with the implementation. These include significant improvement in the trainability of the variational ansatz and notable reductions in run-time for cost calculations. Based on the numerical experiments, our approach exhibits the capability of identifying a separating hyperplane in an 8-dimensional feature space. Moreover, it consistently demonstrated strong performance across various instances with the same dataset.  ( 2 min )
    When Are Two Lists Better than One?: Benefits and Harms in Joint Decision-making. (arXiv:2308.11721v2 [cs.LG] UPDATED)
    Historically, much of machine learning research has focused on the performance of the algorithm alone, but recently more attention has been focused on optimizing joint human-algorithm performance. Here, we analyze a specific type of human-algorithm collaboration where the algorithm has access to a set of $n$ items, and presents a subset of size $k$ to the human, who selects a final item from among those $k$. This scenario could model content recommendation, route planning, or any type of labeling task. Because both the human and algorithm have imperfect, noisy information about the true ordering of items, the key question is: which value of $k$ maximizes the probability that the best item will be ultimately selected? For $k=1$, performance is optimized by the algorithm acting alone, and for $k=n$ it is optimized by the human acting alone. Surprisingly, we show that for multiple of noise models, it is optimal to set $k \in [2, n-1]$ - that is, there are strict benefits to collaborating, even when the human and algorithm have equal accuracy separately. We demonstrate this theoretically for the Mallows model and experimentally for the Random Utilities models of noisy permutations. However, we show this pattern is reversed when the human is anchored on the algorithm's presented ordering - the joint system always has strictly worse performance. We extend these results to the case where the human and algorithm differ in their accuracy levels, showing that there always exist regimes where a more accurate agent would strictly benefit from collaborating with a less accurate one, but these regimes are asymmetric between the human and the algorithm's accuracy.  ( 3 min )
    Predicting Survival Time of Ball Bearings in the Presence of Censoring. (arXiv:2309.07188v1 [eess.SP])
    Ball bearings find widespread use in various manufacturing and mechanical domains, and methods based on machine learning have been widely adopted in the field to monitor wear and spot defects before they lead to failures. Few studies, however, have addressed the problem of censored data, in which failure is not observed. In this paper, we propose a novel approach to predict the time to failure in ball bearings using survival analysis. First, we analyze bearing data in the frequency domain and annotate when a bearing fails by comparing the Kullback-Leibler divergence and the standard deviation between its break-in frequency bins and its break-out frequency bins. Second, we train several survival models to estimate the time to failure based on the annotated data and covariates extracted from the time domain, such as skewness, kurtosis and entropy. The models give a probabilistic prediction of risk over time and allow us to compare the survival function between groups of bearings. We demonstrate our approach on the XJTU and PRONOSTIA datasets. On XJTU, the best result is a 0.70 concordance-index and 0.21 integrated Brier score. On PRONOSTIA, the best is a 0.76 concordance-index and 0.19 integrated Brier score. Our work motivates further work on incorporating censored data in models for predictive maintenance.  ( 2 min )
    On a continuous time model of gradient descent dynamics and instability in deep learning. (arXiv:2302.01952v3 [stat.ML] UPDATED)
    The recipe behind the success of deep learning has been the combination of neural networks and gradient-based optimization. Understanding the behavior of gradient descent however, and particularly its instability, has lagged behind its empirical success. To add to the theoretical tools available to study gradient descent we propose the principal flow (PF), a continuous time flow that approximates gradient descent dynamics. To our knowledge, the PF is the only continuous flow that captures the divergent and oscillatory behaviors of gradient descent, including escaping local minima and saddle points. Through its dependence on the eigendecomposition of the Hessian the PF sheds light on the recently observed edge of stability phenomena in deep learning. Using our new understanding of instability we propose a learning rate adaptation method which enables us to control the trade-off between training stability and test set evaluation performance.  ( 2 min )
    Latent Representation and Simulation of Markov Processes via Time-Lagged Information Bottleneck. (arXiv:2309.07200v1 [cs.LG])
    Markov processes are widely used mathematical models for describing dynamic systems in various fields. However, accurately simulating large-scale systems at long time scales is computationally expensive due to the short time steps required for accurate integration. In this paper, we introduce an inference process that maps complex systems into a simplified representational space and models large jumps in time. To achieve this, we propose Time-lagged Information Bottleneck (T-IB), a principled objective rooted in information theory, which aims to capture relevant temporal features while discarding high-frequency information to simplify the simulation task and minimize the inference error. Our experiments demonstrate that T-IB learns information-optimal representations for accurately modeling the statistical properties and dynamics of the original process at a selected time lag, outperforming existing time-lagged dimensionality reduction methods.  ( 2 min )
    Physically Plausible Full-Body Hand-Object Interaction Synthesis. (arXiv:2309.07907v1 [cs.RO])
    We propose a physics-based method for synthesizing dexterous hand-object interactions in a full-body setting. While recent advancements have addressed specific facets of human-object interactions, a comprehensive physics-based approach remains a challenge. Existing methods often focus on isolated segments of the interaction process and rely on data-driven techniques that may result in artifacts. In contrast, our proposed method embraces reinforcement learning (RL) and physics simulation to mitigate the limitations of data-driven approaches. Through a hierarchical framework, we first learn skill priors for both body and hand movements in a decoupled setting. The generic skill priors learn to decode a latent skill embedding into the motion of the underlying part. A high-level policy then controls hand-object interactions in these pretrained latent spaces, guided by task objectives of grasping and 3D target trajectory following. It is trained using a novel reward function that combines an adversarial style term with a task reward, encouraging natural motions while fulfilling the task incentives. Our method successfully accomplishes the complete interaction task, from approaching an object to grasping and subsequent manipulation. We compare our approach against kinematics-based baselines and show that it leads to more physically plausible motions.  ( 2 min )
    VerilogEval: Evaluating Large Language Models for Verilog Code Generation. (arXiv:2309.07544v1 [cs.LG])
    The increasing popularity of large language models (LLMs) has paved the way for their application in diverse domains. This paper proposes a benchmarking framework tailored specifically for evaluating LLM performance in the context of Verilog code generation for hardware design and verification. We present a comprehensive evaluation dataset consisting of 156 problems from the Verilog instructional website HDLBits. The evaluation set consists of a diverse set of Verilog code generation tasks, ranging from simple combinational circuits to complex finite state machines. The Verilog code completions can be automatically tested for functional correctness by comparing the transient simulation outputs of the generated design with a golden solution. We also demonstrate that the Verilog code generation capability of pretrained language models could be improved with supervised fine-tuning by bootstrapping with LLM generated synthetic problem-code pairs.  ( 2 min )
    Machine Learning and Computer Vision Techniques in Continuous Beehive Monitoring Applications: A survey. (arXiv:2208.00085v3 [cs.CV] UPDATED)
    Wide use and availability of the machine learning and computer vision techniques allows development of relatively complex monitoring systems in many domains. Besides the traditional industrial domain, new application appears also in biology and agriculture, where we could speak about the detection of infections, parasites and weeds, but also about automated monitoring and early warning systems. This is also connected with the introduction of the easily accessible hardware and development kits such as Arduino, or RaspberryPi family. In this paper, we survey 50 existing papers focusing on the methods of automated beehive monitoring methods using the computer vision techniques, particularly on the pollen and Varroa mite detection together with the bee traffic monitoring. Such systems could also be used for the monitoring of the honeybee colonies and for the inspection of their health state, which could identify potentially dangerous states before the situation is critical, or to better plan periodic bee colony inspections and therefore save significant costs. Later, we also include analysis of the research trends in this application field and we outline the possible direction of the new explorations. Our paper is aimed also at veterinary and apidology professionals and experts, who might not be familiar with machine learning to introduce them to its possibilities, therefore each family of applications is opened by a brief theoretical introduction and motivation related to its base method. We hope that this paper will inspire other scientists to use machine learning techniques for other applications in beehive monitoring.  ( 3 min )
    Kernel Conditional Moment Constraints for Confounding Robust Inference. (arXiv:2302.13348v2 [stat.ML] UPDATED)
    We study policy evaluation of offline contextual bandits subject to unobserved confounders. Sensitivity analysis methods are commonly used to estimate the policy value under the worst-case confounding over a given uncertainty set. However, existing work often resorts to some coarse relaxation of the uncertainty set for the sake of tractability, leading to overly conservative estimation of the policy value. In this paper, we propose a general estimator that provides a sharp lower bound of the policy value. It can be shown that our estimator contains the recently proposed sharp estimator by Dorn and Guo (2022) as a special case, and our method enables a novel extension of the classical marginal sensitivity model using f-divergence. To construct our estimator, we leverage the kernel method to obtain a tractable approximation to the conditional moment constraints, which traditional non-sharp estimators failed to take into account. In the theoretical analysis, we provide a condition for the choice of the kernel which guarantees no specification error that biases the lower bound estimation. Furthermore, we provide consistency guarantees of policy evaluation and learning. In the experiments with synthetic and real-world data, we demonstrate the effectiveness of the proposed method.  ( 2 min )
    EarthPT: a foundation model for Earth Observation. (arXiv:2309.07207v1 [cs.LG])
    We introduce EarthPT -- an Earth Observation (EO) pretrained transformer. EarthPT is a 700 million parameter decoding transformer foundation model trained in an autoregressive self-supervised manner and developed specifically with EO use-cases in mind. We demonstrate that EarthPT is an effective forecaster that can accurately predict future pixel-level surface reflectances across the 400-2300 nm range well into the future. For example, forecasts of the evolution of the Normalised Difference Vegetation Index (NDVI) have a typical error of approximately 0.05 (over a natural range of -1 -> 1) at the pixel level over a five month test set horizon, out-performing simple phase-folded models based on historical averaging. We also demonstrate that embeddings learnt by EarthPT hold semantically meaningful information and could be exploited for downstream tasks such as highly granular, dynamic land use classification. Excitingly, we note that the abundance of EO data provides us with -- in theory -- quadrillions of training tokens. Therefore, if we assume that EarthPT follows neural scaling laws akin to those derived for Large Language Models (LLMs), there is currently no data-imposed limit to scaling EarthPT and other similar `Large Observation Models.'  ( 2 min )
    Learning Quasi-Static 3D Models of Markerless Deformable Linear Objects for Bimanual Robotic Manipulation. (arXiv:2309.07609v1 [cs.RO])
    The robotic manipulation of Deformable Linear Objects (DLOs) is a vital and challenging task that is important in many practical applications. Classical model-based approaches to this problem require an accurate model to capture how robot motions affect the deformation of the DLO. Nowadays, data-driven models offer the best tradeoff between quality and computation time. This paper analyzes several learning-based 3D models of the DLO and proposes a new one based on the Transformer architecture that achieves superior accuracy, even on the DLOs of different lengths, thanks to the proposed scaling method. Moreover, we introduce a data augmentation technique, which improves the prediction performance of almost all considered DLO data-driven models. Thanks to this technique, even a simple Multilayer Perceptron (MLP) achieves close to state-of-the-art performance while being significantly faster to evaluate. In the experiments, we compare the performance of the learning-based 3D models of the DLO on several challenging datasets quantitatively and demonstrate their applicability in the task of shaping a DLO.  ( 2 min )
    PolicyCleanse: Backdoor Detection and Mitigation in Reinforcement Learning. (arXiv:2202.03609v5 [cs.LG] UPDATED)
    While real-world applications of reinforcement learning are becoming popular, the security and robustness of RL systems are worthy of more attention and exploration. In particular, recent works have revealed that, in a multi-agent RL environment, backdoor trigger actions can be injected into a victim agent (a.k.a. Trojan agent), which can result in a catastrophic failure as soon as it sees the backdoor trigger action. To ensure the security of RL agents against malicious backdoors, in this work, we propose the problem of Backdoor Detection in a multi-agent competitive reinforcement learning system, with the objective of detecting Trojan agents as well as the corresponding potential trigger actions, and further trying to mitigate their Trojan behavior. In order to solve this problem, we propose PolicyCleanse that is based on the property that the activated Trojan agents accumulated rewards degrade noticeably after several timesteps. Along with PolicyCleanse, we also design a machine unlearning-based approach that can effectively mitigate the detected backdoor. Extensive experiments demonstrate that the proposed methods can accurately detect Trojan agents, and outperform existing backdoor mitigation baseline approaches by at least 3% in winning rate across various types of agents and environments.  ( 3 min )
    Learning From Drift: Federated Learning on Non-IID Data via Drift Regularization. (arXiv:2309.07189v1 [cs.LG])
    Federated learning algorithms perform reasonably well on independent and identically distributed (IID) data. They, on the other hand, suffer greatly from heterogeneous environments, i.e., Non-IID data. Despite the fact that many research projects have been done to address this issue, recent findings indicate that they are still sub-optimal when compared to training on IID data. In this work, we carefully analyze the existing methods in heterogeneous environments. Interestingly, we find that regularizing the classifier's outputs is quite effective in preventing performance degradation on Non-IID data. Motivated by this, we propose Learning from Drift (LfD), a novel method for effectively training the model in heterogeneous settings. Our scheme encapsulates two key components: drift estimation and drift regularization. Specifically, LfD first estimates how different the local model is from the global model (i.e., drift). The local model is then regularized such that it does not fall in the direction of the estimated drift. In the experiment, we evaluate each method through the lens of the five aspects of federated learning, i.e., Generalization, Heterogeneity, Scalability, Forgetting, and Efficiency. Comprehensive evaluation results clearly support the superiority of LfD in federated learning with Non-IID data.  ( 2 min )
    Discrete Acoustic Space for an Efficient Sampling in Neural Text-To-Speech. (arXiv:2110.12539v3 [cs.SD] UPDATED)
    We present a Split Vector Quantized Variational Autoencoder (SVQ-VAE) architecture using a split vector quantizer for NTTS, as an enhancement to the well-known Variational Autoencoder (VAE) and Vector Quantized Variational Autoencoder (VQ-VAE) architectures. Compared to these previous architectures, our proposed model retains the benefits of using an utterance-level bottleneck, while keeping significant representation power and a discretized latent space small enough for efficient prediction from text. We train the model on recordings in the expressive task-oriented dialogues domain and show that SVQ-VAE achieves a statistically significant improvement in naturalness over the VAE and VQ-VAE models. Furthermore, we demonstrate that the SVQ-VAE latent acoustic space is predictable from text, reducing the gap between the standard constant vector synthesis and vocoded recordings by 32%.  ( 2 min )
    Using wearable device-based machine learning models to autonomously identify older adults with poor cognition. (arXiv:2309.07133v1 [eess.SP])
    Conducting cognitive tests is time-consuming for patients and clinicians. Wearable device-based prediction models allow for continuous health monitoring under normal living conditions and could offer an alternative to identifying older adults with cognitive impairments for early interventions. In this study, we first derived novel wearable-based features related to circadian rhythms, ambient light exposure, physical activity levels, sleep, and signal processing. Then, we quantified the ability of wearable-based machine-learning models to predict poor cognition based on outcomes from the Digit Symbol Substitution Test (DSST), the Consortium to Establish a Registry for Alzheimers Disease Word-Learning subtest (CERAD-WL), and the Animal Fluency Test (AFT). We found that the wearable-based models had significantly higher AUCs when predicting all three cognitive outcomes compared to benchmark models containing age, sex, education, marital status, household income, diabetic status, depression symptoms, and functional independence scores. In addition to uncovering previously unidentified wearable-based features that are predictive of poor cognition such as the standard deviation of the midpoints of each persons most active 10-hour periods and least active 5-hour periods, our paper provides proof-of-concept that wearable-based machine learning models can be used to autonomously screen older adults for possible cognitive impairments. Such models offer cost-effective alternatives to conducting initial screenings manually in clinical settings.  ( 2 min )
    MAHTM: A Multi-Agent Framework for Hierarchical Transactive Microgrids. (arXiv:2303.08447v2 [cs.LG] UPDATED)
    Integrating variable renewable energy into the grid has posed challenges to system operators in achieving optimal trade-offs among energy availability, cost affordability, and pollution controllability. This paper proposes a multi-agent reinforcement learning framework for managing energy transactions in microgrids. The framework addresses the challenges above: it seeks to optimize the usage of available resources by minimizing the carbon footprint while benefiting all stakeholders. The proposed architecture consists of three layers of agents, each pursuing different objectives. The first layer, comprised of prosumers and consumers, minimizes the total energy cost. The other two layers control the energy price to decrease the carbon impact while balancing the consumption and production of both renewable and conventional energy. This framework also takes into account fluctuations in energy demand and supply.  ( 2 min )
    BAFFLE: Backdoor Attack in Offline Reinforcement Learning. (arXiv:2210.04688v2 [cs.LG] UPDATED)
    A growing body of research has focused on the Reinforcement Learning (RL) methods which allow the agent to learn from trial-and-error experiences gathered during the interaction with the environment. Recently, offline RL becomes a popular RL paradigm because it saves the interactions with environments. In offline RL, data providers share large pre-collected datasets, and others can train high-quality agents without interacting with the environments. This paradigm has demonstrated effectiveness in critical tasks like robot control, autonomous driving, etc. However, less attention is paid to investigating the security threats to the offline RL system. This paper focuses on backdoor attacks, where some perturbations are added to the data (observations) such that given normal observations, the agent takes high-rewards actions, and low-reward actions on observations injected with triggers. In this paper, we propose Baffle (Backdoor Attack for Offline Reinforcement Learning), an approach that automatically implants backdoors to RL agents by poisoning the offline RL dataset, and evaluate how different offline RL algorithms react to this attack. Our experiments conducted on four tasks and four offline RL algorithms expose a disquieting fact: none of the existing offline RL algorithms is immune to such a backdoor attack. Baffle modifies $10\%$ of the datasets for four tasks. Agents trained on the poisoned datasets perform well in normal settings. However, when triggers are presented, the agents' performance decreases drastically by $63.2\%$, $53.9\%$, $64.7\%$, and $47.4\%$ in the four tasks on average. The backdoor still persists after fine-tuning poisoned agents on clean datasets. We further show that the inserted backdoor is also hard to be detected by a popular defensive method. This paper calls attention to developing more effective protection for the open-source offline RL dataset.  ( 3 min )
    Gaussian Process Surrogate Models for Neural Networks. (arXiv:2208.06028v2 [cs.LG] UPDATED)
    Not being able to understand and predict the behavior of deep learning systems makes it hard to decide what architecture and algorithm to use for a given problem. In science and engineering, modeling is a methodology used to understand complex systems whose internal processes are opaque. Modeling replaces a complex system with a simpler, more interpretable surrogate. Drawing inspiration from this, we construct a class of surrogate models for neural networks using Gaussian processes. Rather than deriving kernels for infinite neural networks, we learn kernels empirically from the naturalistic behavior of finite neural networks. We demonstrate our approach captures existing phenomena related to the spectral bias of neural networks, and then show that our surrogate models can be used to solve practical problems such as identifying which points most influence the behavior of specific neural networks and predicting which architectures and algorithms will generalize well for specific datasets.  ( 2 min )
    Some notes concerning a generalized KMM-type optimization method for density ratio estimation. (arXiv:2309.07887v1 [cs.LG])
    In the present paper we introduce new optimization algorithms for the task of density ratio estimation. More precisely, we consider extending the well-known KMM method using the construction of a suitable loss function, in order to encompass more general situations involving the estimation of density ratio with respect to subsets of the training data and test data, respectively. The associated codes can be found at https://github.com/CDAlecsa/Generalized-KMM.  ( 2 min )
    Correcting sampling biases via importance reweighting for spatial modeling. (arXiv:2309.04824v2 [cs.LG] UPDATED)
    In machine learning models, the estimation of errors is often complex due to distribution bias, particularly in spatial data such as those found in environmental studies. We introduce an approach based on the ideas of importance sampling to obtain an unbiased estimate of the target error. By taking into account difference between desirable error and available data, our method reweights errors at each sample point and neutralizes the shift. Importance sampling technique and kernel density estimation were used for reweighteing. We validate the effectiveness of our approach using artificial data that resemble real-world spatial datasets. Our findings demonstrate advantages of the proposed approach for the estimation of the target error, offering a solution to a distribution shift problem. Overall error of predictions dropped from 7% to just 2% and it gets smaller for larger samples.  ( 2 min )
    Domain Generalization for Crop Segmentation with Knowledge Distillation. (arXiv:2304.01029v2 [cs.CV] UPDATED)
    In recent years, precision agriculture has gradually oriented farming closer to automation processes to support all the activities related to field management. Service robotics plays a predominant role in this evolution by deploying autonomous agents that can navigate fields while performing tasks without human intervention, such as monitoring, spraying, and harvesting. To execute these precise actions, mobile robots need a real-time perception system that understands their surroundings and identifies their targets in the wild. Generalizing to new crops and environmental conditions is critical for practical applications, as labeled samples are rarely available. In this paper, we investigate the problem of crop segmentation and propose a novel approach to enhance domain generalization using knowledge distillation. In the proposed framework, we transfer knowledge from an ensemble of models individually trained on source domains to a student model that can adapt to unseen target domains. To evaluate the proposed method, we present a synthetic multi-domain dataset for crop segmentation containing plants of variegate shapes and covering different terrain styles, weather conditions, and light scenarios for more than 50,000 samples. We demonstrate significant improvements in performance over state-of-the-art methods and superior sim-to-real generalization. Our approach provides a promising solution for domain generalization in crop segmentation and has the potential to enhance a wide variety of precision agriculture applications.  ( 2 min )
    Adaptive approximation of monotone functions. (arXiv:2309.07530v1 [cs.LG])
    We study the classical problem of approximating a non-decreasing function $f: \mathcal{X} \to \mathcal{Y}$ in $L^p(\mu)$ norm by sequentially querying its values, for known compact real intervals $\mathcal{X}$, $\mathcal{Y}$ and a known probability measure $\mu$ on $\cX$. For any function~$f$ we characterize the minimum number of evaluations of $f$ that algorithms need to guarantee an approximation $\hat{f}$ with an $L^p(\mu)$ error below $\epsilon$ after stopping. Unlike worst-case results that hold uniformly over all $f$, our complexity measure is dependent on each specific function $f$. To address this problem, we introduce GreedyBox, a generalization of an algorithm originally proposed by Novak (1992) for numerical integration. We prove that GreedyBox achieves an optimal sample complexity for any function $f$, up to logarithmic factors. Additionally, we uncover results regarding piecewise-smooth functions. Perhaps as expected, the $L^p(\mu)$ error of GreedyBox decreases much faster for piecewise-$C^2$ functions than predicted by the algorithm (without any knowledge on the smoothness of $f$). A simple modification even achieves optimal minimax approximation rates for such functions, which we compute explicitly. In particular, our findings highlight multiple performance gaps between adaptive and non-adaptive algorithms, smooth and piecewise-smooth functions, as well as monotone or non-monotone functions. Finally, we provide numerical experiments to support our theoretical results.  ( 2 min )
    ConSpec: honing in on critical steps for rapid learning and generalization in RL. (arXiv:2210.05845v5 [cs.LG] UPDATED)
    In real life, success is often contingent upon multiple critical steps that are distant in time from each other and from the final reward. These critical steps are challenging to identify with traditional reinforcement learning (RL) methods that rely on the Bellman equation for credit assignment. Here, we present a new RL algorithm that uses offline contrastive learning to hone in on critical steps. This algorithm, which we call contrastive introspection (ConSpec), can be added to any existing RL algorithm. ConSpec learns a set of prototypes for the critical steps in a task by a novel contrastive loss and delivers an intrinsic reward when the current state matches one of these prototypes. The prototypes in ConSpec provide two key benefits for credit assignment: (1) They enable rapid identification of all the critical steps. (2) They do so in a readily interpretable manner, enabling out-of-distribution generalization when sensory features are altered. Distinct from other contemporary RL approaches to credit assignment, ConSpec takes advantage of the fact that it is easier to retrospectively identify the small set of steps that success is contingent upon than it is to prospectively predict reward at every step taken in the environment. Altogether, ConSpec improves learning in a diverse set of RL tasks, including both those with explicit, discrete critical steps and those with complex, continuous critical steps.  ( 3 min )
    A Fast Optimization View: Reformulating Single Layer Attention in LLM Based on Tensor and SVM Trick, and Solving It in Matrix Multiplication Time. (arXiv:2309.07418v1 [cs.DS])
    Large language models (LLMs) have played a pivotal role in revolutionizing various facets of our daily existence. Solving attention regression is a fundamental task in optimizing LLMs. In this work, we focus on giving a provable guarantee for the one-layer attention network objective function $L(X,Y) = \sum_{j_0 = 1}^n \sum_{i_0 = 1}^d ( \langle \langle \exp( \mathsf{A}_{j_0} x ) , {\bf 1}_n \rangle^{-1} \exp( \mathsf{A}_{j_0} x ), A_{3} Y_{*,i_0} \rangle - b_{j_0,i_0} )^2$. Here $\mathsf{A} \in \mathbb{R}^{n^2 \times d^2}$ is Kronecker product between $A_1 \in \mathbb{R}^{n \times d}$ and $A_2 \in \mathbb{R}^{n \times d}$. $A_3$ is a matrix in $\mathbb{R}^{n \times d}$, $\mathsf{A}_{j_0} \in \mathbb{R}^{n \times d^2}$ is the $j_0$-th block of $\mathsf{A}$. The $X, Y \in \mathbb{R}^{d \times d}$ are variables we want to learn. $B \in \mathbb{R}^{n \times d}$ and $b_{j_0,i_0} \in \mathbb{R}$ is one entry at $j_0$-th row and $i_0$-th column of $B$, $Y_{*,i_0} \in \mathbb{R}^d$ is the $i_0$-column vector of $Y$, and $x \in \mathbb{R}^{d^2}$ is the vectorization of $X$. In a multi-layer LLM network, the matrix $B \in \mathbb{R}^{n \times d}$ can be viewed as the output of a layer, and $A_1= A_2 = A_3 \in \mathbb{R}^{n \times d}$ can be viewed as the input of a layer. The matrix version of $x$ can be viewed as $QK^\top$ and $Y$ can be viewed as $V$. We provide an iterative greedy algorithm to train loss function $L(X,Y)$ up $\epsilon$ that runs in $\widetilde{O}( ({\cal T}_{\mathrm{mat}}(n,n,d) + {\cal T}_{\mathrm{mat}}(n,d,d) + d^{2\omega}) \log(1/\epsilon) )$ time. Here ${\cal T}_{\mathrm{mat}}(a,b,c)$ denotes the time of multiplying $a \times b$ matrix another $b \times c$ matrix, and $\omega\approx 2.37$ denotes the exponent of matrix multiplication.  ( 3 min )
    HurriCast: An Automatic Framework Using Machine Learning and Statistical Modeling for Hurricane Forecasting. (arXiv:2309.07174v1 [cs.LG])
    Hurricanes present major challenges in the U.S. due to their devastating impacts. Mitigating these risks is important, and the insurance industry is central in this effort, using intricate statistical models for risk assessment. However, these models often neglect key temporal and spatial hurricane patterns and are limited by data scarcity. This study introduces a refined approach combining the ARIMA model and K-MEANS to better capture hurricane trends, and an Autoencoder for enhanced hurricane simulations. Our experiments show that this hybrid methodology effectively simulate historical hurricane behaviors while providing detailed projections of potential future trajectories and intensities. Moreover, by leveraging a comprehensive yet selective dataset, our simulations enrich the current understanding of hurricane patterns and offer actionable insights for risk management strategies.  ( 2 min )
    The kernel-balanced equation for deep neural networks. (arXiv:2309.07367v1 [cond-mat.dis-nn])
    Deep neural networks have shown many fruitful applications in this decade. A network can get the generalized function through training with a finite dataset. The degree of generalization is a realization of the proximity scale in the data space. Specifically, the scale is not clear if the dataset is complicated. Here we consider a network for the distribution estimation of the dataset. We show the estimation is unstable and the instability depends on the data density and training duration. We derive the kernel-balanced equation, which gives a short phenomenological description of the solution. The equation tells us the reason for the instability and the mechanism of the scale. The network outputs a local average of the dataset as a prediction and the scale of averaging is determined along the equation. The scale gradually decreases along training and finally results in instability in our case.  ( 2 min )
    LambdaKG: A Library for Pre-trained Language Model-Based Knowledge Graph Embeddings. (arXiv:2210.00305v3 [cs.CL] UPDATED)
    Knowledge Graphs (KGs) often have two characteristics: heterogeneous graph structure and text-rich entity/relation information. Text-based KG embeddings can represent entities by encoding descriptions with pre-trained language models, but no open-sourced library is specifically designed for KGs with PLMs at present. In this paper, we present LambdaKG, a library for KGE that equips with many pre-trained language models (e.g., BERT, BART, T5, GPT-3), and supports various tasks (e.g., knowledge graph completion, question answering, recommendation, and knowledge probing). LambdaKG is publicly open-sourced at https://github.com/zjunlp/PromptKG/tree/main/lambdaKG, with a demo video at this http URL and long-term maintenance.  ( 2 min )
    Model-free Learning of Regions of Attraction via Recurrent Sets. (arXiv:2204.10372v2 [cs.LG] UPDATED)
    We consider the problem of learning an inner approximation of the region of attraction (ROA) of an asymptotically stable equilibrium point without an explicit model of the dynamics. Rather than leveraging approximate models with bounded uncertainty to find a (robust) invariant set contained in the ROA, we propose to learn sets that satisfy a more relaxed notion of containment known as recurrence. We define a set to be $\tau$-recurrent (resp. $k$-recurrent) if every trajectory that starts within the set, returns to it after at most $\tau$ seconds (resp. $k$ steps). We show that under mild assumptions a $\tau$-recurrent set containing a stable equilibrium must be a subset of its ROA. We then leverage this property to develop algorithms that compute inner approximations of the ROA using counter-examples of recurrence that are obtained by sampling finite-length trajectories. Our algorithms process samples sequentially, which allow them to continue being executed even after an initial offline training stage. We further provide an upper bound on the number of counter-examples used by the algorithm, and almost sure convergence guarantees.  ( 2 min )
    Neural Categorical Priors for Physics-Based Character Control. (arXiv:2308.07200v2 [cs.GR] UPDATED)
    Recent advances in learning reusable motion priors have demonstrated their effectiveness in generating naturalistic behaviors. In this paper, we propose a new learning framework in this paradigm for controlling physics-based characters with significantly improved motion quality and diversity over existing state-of-the-art methods. The proposed method uses reinforcement learning (RL) to initially track and imitate life-like movements from unstructured motion clips using the discrete information bottleneck, as adopted in the Vector Quantized Variational AutoEncoder (VQ-VAE). This structure compresses the most relevant information from the motion clips into a compact yet informative latent space, i.e., a discrete space over vector quantized codes. By sampling codes in the space from a trained categorical prior distribution, high-quality life-like behaviors can be generated, similar to the usage of VQ-VAE in computer vision. Although this prior distribution can be trained with the supervision of the encoder's output, it follows the original motion clip distribution in the dataset and could lead to imbalanced behaviors in our setting. To address the issue, we further propose a technique named prior shifting to adjust the prior distribution using curiosity-driven RL. The outcome distribution is demonstrated to offer sufficient behavioral diversity and significantly facilitates upper-level policy learning for downstream tasks. We conduct comprehensive experiments using humanoid characters on two challenging downstream tasks, sword-shield striking and two-player boxing game. Our results demonstrate that the proposed framework is capable of controlling the character to perform considerably high-quality movements in terms of behavioral strategies, diversity, and realism. Videos, codes, and data are available at https://tencent-roboticsx.github.io/NCP/.  ( 3 min )
    Feature Engineering in Learning-to-Rank for Community Question Answering Task. (arXiv:2309.07610v1 [cs.LG])
    Community question answering (CQA) forums are Internet-based platforms where users ask questions about a topic and other expert users try to provide solutions. Many CQA forums such as Quora, Stackoverflow, Yahoo!Answer, StackExchange exist with a lot of user-generated data. These data are leveraged in automated CQA ranking systems where similar questions (and answers) are presented in response to the query of the user. In this work, we empirically investigate a few aspects of this domain. Firstly, in addition to traditional features like TF-IDF, BM25 etc., we introduce a BERT-based feature that captures the semantic similarity between the question and answer. Secondly, most of the existing research works have focused on features extracted only from the question part; features extracted from answers have not been explored extensively. We combine both types of features in a linear fashion. Thirdly, using our proposed concepts, we conduct an empirical investigation with different rank-learning algorithms, some of which have not been used so far in CQA domain. On three standard CQA datasets, our proposed framework achieves state-of-the-art performance. We also analyze importance of the features we use in our investigation. This work is expected to guide the practitioners to select a better set of features for the CQA retrieval task.  ( 2 min )
    Using Unsupervised and Supervised Learning and Digital Twin for Deep Convective Ice Storm Classification. (arXiv:2309.07173v1 [cs.LG])
    Smart Ice Cloud Sensing (SMICES) is a small-sat concept in which a primary radar intelligently targets ice storms based on information collected by a lookahead radiometer. Critical to the intelligent targeting is accurate identification of storm/cloud types from eight bands of radiance collected by the radiometer. The cloud types of interest are: clear sky, thin cirrus, cirrus, rainy anvil, and convection core. We describe multi-step use of Machine Learning and Digital Twin of the Earth's atmosphere to derive such a classifier. First, a digital twin of Earth's atmosphere called a Weather Research Forecast (WRF) is used generate simulated lookahead radiometer data as well as deeper "science" hidden variables. The datasets simulate a tropical region over the Caribbean and a non-tropical region over the Atlantic coast of the United States. A K-means clustering over the scientific hidden variables was utilized by human experts to generate an automatic labelling of the data - mapping each physical data point to cloud types by scientists informed by mean/centroids of hidden variables of the clusters. Next, classifiers were trained with the inputs of the simulated radiometer data and its corresponding label. The classifiers of a random decision forest (RDF), support vector machine (SVM), Gaussian na\"ive bayes, feed forward artificial neural network (ANN), and a convolutional neural network (CNN) were trained. Over the tropical dataset, the best performing classifier was able to identify non-storm and storm clouds with over 80% accuracy in each class for a held-out test set. Over the non-tropical dataset, the best performing classifier was able to classify non-storm clouds with over 90% accuracy and storm clouds with over 40% accuracy. Additionally both sets of classifiers were shown to be resilient to instrument noise.  ( 3 min )
    Goal Space Abstraction in Hierarchical Reinforcement Learning via Reachability Analysis. (arXiv:2309.07168v1 [cs.LG])
    Open-ended learning benefits immensely from the use of symbolic methods for goal representation as they offer ways to structure knowledge for efficient and transferable learning. However, the existing Hierarchical Reinforcement Learning (HRL) approaches relying on symbolic reasoning are often limited as they require a manual goal representation. The challenge in autonomously discovering a symbolic goal representation is that it must preserve critical information, such as the environment dynamics. In this work, we propose a developmental mechanism for subgoal discovery via an emergent representation that abstracts (i.e., groups together) sets of environment states that have similar roles in the task. We create a HRL algorithm that gradually learns this representation along with the policies and evaluate it on navigation tasks to show the learned representation is interpretable and results in data efficiency.  ( 2 min )
    CloudBrain-NMR: An Intelligent Cloud Computing Platform for NMR Spectroscopy Processing, Reconstruction and Analysis. (arXiv:2309.07178v1 [q-bio.QM])
    Nuclear Magnetic Resonance (NMR) spectroscopy has served as a powerful analytical tool for studying molecular structure and dynamics in chemistry and biology. However, the processing of raw data acquired from NMR spectrometers and subsequent quantitative analysis involves various specialized tools, which necessitates comprehensive knowledge in programming and NMR. Particularly, the emerging deep learning tools is hard to be widely used in NMR due to the sophisticated setup of computation. Thus, NMR processing is not an easy task for chemist and biologists. In this work, we present CloudBrain-NMR, an intelligent online cloud computing platform designed for NMR data reading, processing, reconstruction, and quantitative analysis. The platform is conveniently accessed through a web browser, eliminating the need for any program installation on the user side. CloudBrain-NMR uses parallel computing with graphics processing units and central processing units, resulting in significantly shortened computation time. Furthermore, it incorporates state-of-the-art deep learning-based algorithms offering comprehensive functionalities that allow users to complete the entire processing procedure without relying on additional software. This platform has empowered NMR applications with advanced artificial intelligence processing. CloudBrain-NMR is openly accessible for free usage at https://csrc.xmu.edu.cn/CloudBrain.html  ( 2 min )
    Market-GAN: Adding Control to Financial Market Data Generation with Semantic Context. (arXiv:2309.07708v1 [cs.LG])
    Financial simulators play an important role in enhancing forecasting accuracy, managing risks, and fostering strategic financial decision-making. Despite the development of financial market simulation methodologies, existing frameworks often struggle with adapting to specialized simulation context. We pinpoint the challenges as i) current financial datasets do not contain context labels; ii) current techniques are not designed to generate financial data with context as control, which demands greater precision compared to other modalities; iii) the inherent difficulties in generating context-aligned, high-fidelity data given the non-stationary, noisy nature of financial data. To address these challenges, our contributions are: i) we proposed the Contextual Market Dataset with market dynamics, stock ticker, and history state as context, leveraging a market dynamics modeling method that combines linear regression and Dynamic Time Warping clustering to extract market dynamics; ii) we present Market-GAN, a novel architecture incorporating a Generative Adversarial Networks (GAN) for the controllable generation with context, an autoencoder for learning low-dimension features, and supervisors for knowledge transfer; iii) we introduce a two-stage training scheme to ensure that Market-GAN captures the intrinsic market distribution with multiple objectives. In the pertaining stage, with the use of the autoencoder and supervisors, we prepare the generator with a better initialization for the adversarial training stage. We propose a set of holistic evaluation metrics that consider alignment, fidelity, data usability on downstream tasks, and market facts. We evaluate Market-GAN with the Dow Jones Industrial Average data from 2000 to 2023 and showcase superior performance in comparison to 4 state-of-the-art time-series generative models.  ( 3 min )
    Traveling Words: A Geometric Interpretation of Transformers. (arXiv:2309.07315v1 [cs.CL])
    Transformers have significantly advanced the field of natural language processing, but comprehending their internal mechanisms remains a challenge. In this paper, we introduce a novel geometric perspective that elucidates the inner mechanisms of transformer operations. Our primary contribution is illustrating how layer normalization confines the latent features to a hyper-sphere, subsequently enabling attention to mold the semantic representation of words on this surface. This geometric viewpoint seamlessly connects established properties such as iterative refinement and contextual embeddings. We validate our insights by probing a pre-trained 124M parameter GPT-2 model. Our findings reveal clear query-key attention patterns in early layers and build upon prior observations regarding the subject-specific nature of attention heads at deeper layers. Harnessing these geometric insights, we present an intuitive understanding of transformers, depicting them as processes that model the trajectory of word particles along the hyper-sphere.  ( 2 min )
  • Open

    Random Feature Amplification: Feature Learning and Generalization in Neural Networks. (arXiv:2202.07626v4 [cs.LG] UPDATED)
    In this work, we provide a characterization of the feature-learning process in two-layer ReLU networks trained by gradient descent on the logistic loss following random initialization. We consider data with binary labels that are generated by an XOR-like function of the input features. We permit a constant fraction of the training labels to be corrupted by an adversary. We show that, although linear classifiers are no better than random guessing for the distribution we consider, two-layer ReLU networks trained by gradient descent achieve generalization error close to the label noise rate. We develop a novel proof technique that shows that at initialization, the vast majority of neurons function as random features that are only weakly correlated with useful features, and the gradient descent dynamics 'amplify' these weak, random features to strong, useful features.
    Learning nonparametric DAGs with incremental information via high-order HSIC. (arXiv:2308.05969v2 [cs.LG] UPDATED)
    Score-based methods for learning Bayesain networks(BN) aim to maximizing the global score functions. However, if local variables have direct and indirect dependence simultaneously, the global optimization on score functions misses edges between variables with indirect dependent relationship, of which scores are smaller than those with direct dependent relationship. In this paper, we present an identifiability condition based on a determined subset of parents to identify the underlying DAG. By the identifiability condition, we develop a two-phase algorithm namely optimal-tuning (OT) algorithm to locally amend the global optimization. In the optimal phase, an optimization problem based on first-order Hilbert-Schmidt independence criterion (HSIC) gives an estimated skeleton as the initial determined parents subset. In the tuning phase, the skeleton is locally tuned by deletion, addition and DAG-formalization strategies using the theoretically proved incremental properties of high-order HSIC. Numerical experiments for different synthetic datasets and real-world datasets show that the OT algorithm outperforms existing methods. Especially in Sigmoid Mix model with the size of the graph being ${\rm\bf d=40}$, the structure intervention distance (SID) of the OT algorithm is 329.7 smaller than the one obtained by CAM, which indicates that the graph estimated by the OT algorithm misses fewer edges compared with CAM.Source code of the OT algorithm is available at https://github.com/YafeiannWang/optimal-tune-algorithm.
    SC-MAD: Mixtures of Higher-order Networks for Data Augmentation. (arXiv:2309.07453v1 [stat.ML])
    The myriad complex systems with multiway interactions motivate the extension of graph-based pairwise connections to higher-order relations. In particular, the simplicial complex has inspired generalizations of graph neural networks (GNNs) to simplicial complex-based models. Learning on such systems requires large amounts of data, which can be expensive or impossible to obtain. We propose data augmentation of simplicial complexes through both linear and nonlinear mixup mechanisms that return mixtures of existing labeled samples. In addition to traditional pairwise mixup, we present a convex clustering mixup approach for a data-driven relationship among several simplicial complexes. We theoretically demonstrate that the resultant synthetic simplicial complexes interpolate among existing data with respect to homomorphism densities. Our method is demonstrated on both synthetic and real-world datasets for simplicial complex classification.
    Multi-Source Domain Adaptation meets Dataset Distillation through Dataset Dictionary Learning. (arXiv:2309.07666v1 [cs.LG])
    In this paper, we consider the intersection of two problems in machine learning: Multi-Source Domain Adaptation (MSDA) and Dataset Distillation (DD). On the one hand, the first considers adapting multiple heterogeneous labeled source domains to an unlabeled target domain. On the other hand, the second attacks the problem of synthesizing a small summary containing all the information about the datasets. We thus consider a new problem called MSDA-DD. To solve it, we adapt previous works in the MSDA literature, such as Wasserstein Barycenter Transport and Dataset Dictionary Learning, as well as DD method Distribution Matching. We thoroughly experiment with this novel problem on four benchmarks (Caltech-Office 10, Tennessee-Eastman Process, Continuous Stirred Tank Reactor, and Case Western Reserve University), where we show that, even with as little as 1 sample per class, one achieves state-of-the-art adaptation performance.
    Gaussian Process Surrogate Models for Neural Networks. (arXiv:2208.06028v2 [cs.LG] UPDATED)
    Not being able to understand and predict the behavior of deep learning systems makes it hard to decide what architecture and algorithm to use for a given problem. In science and engineering, modeling is a methodology used to understand complex systems whose internal processes are opaque. Modeling replaces a complex system with a simpler, more interpretable surrogate. Drawing inspiration from this, we construct a class of surrogate models for neural networks using Gaussian processes. Rather than deriving kernels for infinite neural networks, we learn kernels empirically from the naturalistic behavior of finite neural networks. We demonstrate our approach captures existing phenomena related to the spectral bias of neural networks, and then show that our surrogate models can be used to solve practical problems such as identifying which points most influence the behavior of specific neural networks and predicting which architectures and algorithms will generalize well for specific datasets.
    Dataset Size Dependence of Rate-Distortion Curve and Threshold of Posterior Collapse in Linear VAE. (arXiv:2309.07663v1 [stat.ML])
    In the Variational Autoencoder (VAE), the variational posterior often aligns closely with the prior, which is known as posterior collapse and hinders the quality of representation learning. To mitigate this problem, an adjustable hyperparameter beta has been introduced in the VAE. This paper presents a closed-form expression to assess the relationship between the beta in VAE, the dataset size, the posterior collapse, and the rate-distortion curve by analyzing a minimal VAE in a high-dimensional limit. These results clarify that a long plateau in the generalization error emerges with a relatively larger beta. As the beta increases, the length of the plateau extends and then becomes infinite beyond a certain beta threshold. This implies that the choice of beta, unlike the usual regularization parameters, can induce posterior collapse regardless of the dataset size. Thus, beta is a risky parameter that requires careful tuning. Furthermore, considering the dataset-size dependence on the rate-distortion curve, a relatively large dataset is required to obtain a rate-distortion curve with high rates. Extensive numerical experiments support our analysis.
    Trajectory-oriented optimization of stochastic epidemiological models. (arXiv:2305.03926v3 [stat.AP] UPDATED)
    Epidemiological models must be calibrated to ground truth for downstream tasks such as producing forward projections or running what-if scenarios. The meaning of calibration changes in case of a stochastic model since output from such a model is generally described via an ensemble or a distribution. Each member of the ensemble is usually mapped to a random number seed (explicitly or implicitly). With the goal of finding not only the input parameter settings but also the random seeds that are consistent with the ground truth, we propose a class of Gaussian process (GP) surrogates along with an optimization strategy based on Thompson sampling. This Trajectory Oriented Optimization (TOO) approach produces actual trajectories close to the empirical observations instead of a set of parameter settings where only the mean simulation behavior matches with the ground truth.
    Nowhere coexpanding functions. (arXiv:2303.12814v3 [stat.ML] UPDATED)
    We define a family of $C^1$ functions which we call "nowhere coexpanding functions" that is closed under composition and includes all $C^3$ functions with non-positive Schwarzian derivative. We establish results on the number and nature of the fixed points of these functions, including a generalisation of a classic result of Singer.
    Simultaneous inference for generalized linear models with unmeasured confounders. (arXiv:2309.07261v1 [stat.ME])
    Tens of thousands of simultaneous hypothesis tests are routinely performed in genomic studies to identify differentially expressed genes. However, due to unmeasured confounders, many standard statistical approaches may be substantially biased. This paper investigates the large-scale hypothesis testing problem for multivariate generalized linear models in the presence of confounding effects. Under arbitrary confounding mechanisms, we propose a unified statistical estimation and inference framework that harnesses orthogonal structures and integrates linear projections into three key stages. It first leverages multivariate responses to separate marginal and uncorrelated confounding effects, recovering the confounding coefficients' column space. Subsequently, latent factors and primary effects are jointly estimated, utilizing $\ell_1$-regularization for sparsity while imposing orthogonality onto confounding coefficients. Finally, we incorporate projected and weighted bias-correction steps for hypothesis testing. Theoretically, we establish various effects' identification conditions and non-asymptotic error bounds. We show effective Type-I error control of asymptotic $z$-tests as sample and response sizes approach infinity. Numerical experiments demonstrate that the proposed method controls the false discovery rate by the Benjamini-Hochberg procedure and is more powerful than alternative methods. By comparing single-cell RNA-seq counts from two groups of samples, we demonstrate the suitability of adjusting confounding effects when significant covariates are absent from the model.
    Speeding up Learning Quantum States through Group Equivariant Convolutional Quantum Ans\"atze. (arXiv:2112.07611v3 [quant-ph] UPDATED)
    We develop a theoretical framework for $S_n$-equivariant convolutional quantum circuits with SU$(d)$-symmetry, building on and significantly generalizing Jordan's Permutational Quantum Computing (PQC) formalism based on Schur-Weyl duality connecting both SU$(d)$ and $S_n$ actions on qudits. In particular, we utilize the Okounkov-Vershik approach to prove Harrow's statement (Ph.D. Thesis 2005 p.160) on the equivalence between $\operatorname{SU}(d)$ and $S_n$ irrep bases and to establish the $S_n$-equivariant Convolutional Quantum Alternating Ans\"atze ($S_n$-CQA) using Young-Jucys-Murphy (YJM) elements. We prove that $S_n$-CQA is able to generate any unitary in any given $S_n$ irrep sector, which may serve as a universal model for a wide array of quantum machine learning problems with the presence of SU($d$) symmetry. Our method provides another way to prove the universality of Quantum Approximate Optimization Algorithm (QAOA) and verifies that 4-local SU($d$) symmetric unitaries are sufficient to build generic SU($d$) symmetric quantum circuits up to relative phase factors. We present numerical simulations to showcase the effectiveness of the ans\"atze to find the ground state energy of the $J_1$--$J_2$ antiferromagnetic Heisenberg model on the rectangular and Kagome lattices. Our work provides the first application of the celebrated Okounkov-Vershik's $S_n$ representation theory to quantum physics and machine learning, from which to propose quantum variational ans\"atze that strongly suggests to be classically intractable tailored towards a specific optimization problem.
    Reliability-based cleaning of noisy training labels with inductive conformal prediction in multi-modal biomedical data mining. (arXiv:2309.07332v1 [cs.LG])
    Accurately labeling biomedical data presents a challenge. Traditional semi-supervised learning methods often under-utilize available unlabeled data. To address this, we propose a novel reliability-based training data cleaning method employing inductive conformal prediction (ICP). This method capitalizes on a small set of accurately labeled training data and leverages ICP-calculated reliability metrics to rectify mislabeled data and outliers within vast quantities of noisy training data. The efficacy of the method is validated across three classification tasks within distinct modalities: filtering drug-induced-liver-injury (DILI) literature with title and abstract, predicting ICU admission of COVID-19 patients through CT radiomics and electronic health records, and subtyping breast cancer using RNA-sequencing data. Varying levels of noise to the training labels were introduced through label permutation. Results show significant enhancements in classification performance: accuracy enhancement in 86 out of 96 DILI experiments (up to 11.4%), AUROC and AUPRC enhancements in all 48 COVID-19 experiments (up to 23.8% and 69.8%), and accuracy and macro-average F1 score improvements in 47 out of 48 RNA-sequencing experiments (up to 74.6% and 89.0%). Our method offers the potential to substantially boost classification performance in multi-modal biomedical machine learning tasks. Importantly, it accomplishes this without necessitating an excessive volume of meticulously curated training data.  ( 3 min )
    Choosing a Proxy Metric from Past Experiments. (arXiv:2309.07893v1 [stat.ME])
    In many randomized experiments, the treatment effect of the long-term metric (i.e. the primary outcome of interest) is often difficult or infeasible to measure. Such long-term metrics are often slow to react to changes and sufficiently noisy they are challenging to faithfully estimate in short-horizon experiments. A common alternative is to measure several short-term proxy metrics in the hope they closely track the long-term metric -- so they can be used to effectively guide decision-making in the near-term. We introduce a new statistical framework to both define and construct an optimal proxy metric for use in a homogeneous population of randomized experiments. Our procedure first reduces the construction of an optimal proxy metric in a given experiment to a portfolio optimization problem which depends on the true latent treatment effects and noise level of experiment under consideration. We then denoise the observed treatment effects of the long-term metric and a set of proxies in a historical corpus of randomized experiments to extract estimates of the latent treatment effects for use in the optimization problem. One key insight derived from our approach is that the optimal proxy metric for a given experiment is not apriori fixed; rather it should depend on the sample size (or effective noise level) of the randomized experiment for which it is deployed. To instantiate and evaluate our framework, we employ our methodology in a large corpus of randomized experiments from an industrial recommendation system and construct proxy metrics that perform favorably relative to several baselines.  ( 2 min )
    Joint Community Detection and Rotational Synchronization via Semidefinite Programming. (arXiv:2105.06031v2 [stat.ML] UPDATED)
    In the presence of heterogeneous data, where randomly rotated objects fall into multiple underlying categories, it is challenging to simultaneously classify them into clusters and synchronize them based on pairwise relations. This gives rise to the joint problem of community detection and synchronization. We propose a series of semidefinite relaxations, and prove their exact recovery when extending the celebrated stochastic block model to this new setting where both rotations and cluster identities are to be determined. Numerical experiments demonstrate the efficacy of our proposed algorithms and confirm our theoretical result which indicates a sharp phase transition for exact recovery.  ( 2 min )
    Beta Diffusion. (arXiv:2309.07867v1 [cs.LG])
    We introduce beta diffusion, a novel generative modeling method that integrates demasking and denoising to generate data within bounded ranges. Using scaled and shifted beta distributions, beta diffusion utilizes multiplicative transitions over time to create both forward and reverse diffusion processes, maintaining beta distributions in both the forward marginals and the reverse conditionals, given the data at any point in time. Unlike traditional diffusion-based generative models relying on additive Gaussian noise and reweighted evidence lower bounds (ELBOs), beta diffusion is multiplicative and optimized with KL-divergence upper bounds (KLUBs) derived from the convexity of the KL divergence. We demonstrate that the proposed KLUBs are more effective for optimizing beta diffusion compared to negative ELBOs, which can also be derived as the KLUBs of the same KL divergence with its two arguments swapped. The loss function of beta diffusion, expressed in terms of Bregman divergence, further supports the efficacy of KLUBs for optimization. Experimental results on both synthetic data and natural images demonstrate the unique capabilities of beta diffusion in generative modeling of range-bounded data and validate the effectiveness of KLUBs in optimizing diffusion models, thereby making them valuable additions to the family of diffusion-based generative models and the optimization techniques used to train them.  ( 2 min )
    A Fast Optimization View: Reformulating Single Layer Attention in LLM Based on Tensor and SVM Trick, and Solving It in Matrix Multiplication Time. (arXiv:2309.07418v1 [cs.DS])
    Large language models (LLMs) have played a pivotal role in revolutionizing various facets of our daily existence. Solving attention regression is a fundamental task in optimizing LLMs. In this work, we focus on giving a provable guarantee for the one-layer attention network objective function $L(X,Y) = \sum_{j_0 = 1}^n \sum_{i_0 = 1}^d ( \langle \langle \exp( \mathsf{A}_{j_0} x ) , {\bf 1}_n \rangle^{-1} \exp( \mathsf{A}_{j_0} x ), A_{3} Y_{*,i_0} \rangle - b_{j_0,i_0} )^2$. Here $\mathsf{A} \in \mathbb{R}^{n^2 \times d^2}$ is Kronecker product between $A_1 \in \mathbb{R}^{n \times d}$ and $A_2 \in \mathbb{R}^{n \times d}$. $A_3$ is a matrix in $\mathbb{R}^{n \times d}$, $\mathsf{A}_{j_0} \in \mathbb{R}^{n \times d^2}$ is the $j_0$-th block of $\mathsf{A}$. The $X, Y \in \mathbb{R}^{d \times d}$ are variables we want to learn. $B \in \mathbb{R}^{n \times d}$ and $b_{j_0,i_0} \in \mathbb{R}$ is one entry at $j_0$-th row and $i_0$-th column of $B$, $Y_{*,i_0} \in \mathbb{R}^d$ is the $i_0$-column vector of $Y$, and $x \in \mathbb{R}^{d^2}$ is the vectorization of $X$. In a multi-layer LLM network, the matrix $B \in \mathbb{R}^{n \times d}$ can be viewed as the output of a layer, and $A_1= A_2 = A_3 \in \mathbb{R}^{n \times d}$ can be viewed as the input of a layer. The matrix version of $x$ can be viewed as $QK^\top$ and $Y$ can be viewed as $V$. We provide an iterative greedy algorithm to train loss function $L(X,Y)$ up $\epsilon$ that runs in $\widetilde{O}( ({\cal T}_{\mathrm{mat}}(n,n,d) + {\cal T}_{\mathrm{mat}}(n,d,d) + d^{2\omega}) \log(1/\epsilon) )$ time. Here ${\cal T}_{\mathrm{mat}}(a,b,c)$ denotes the time of multiplying $a \times b$ matrix another $b \times c$ matrix, and $\omega\approx 2.37$ denotes the exponent of matrix multiplication.  ( 3 min )
    Kernel Conditional Moment Constraints for Confounding Robust Inference. (arXiv:2302.13348v2 [stat.ML] UPDATED)
    We study policy evaluation of offline contextual bandits subject to unobserved confounders. Sensitivity analysis methods are commonly used to estimate the policy value under the worst-case confounding over a given uncertainty set. However, existing work often resorts to some coarse relaxation of the uncertainty set for the sake of tractability, leading to overly conservative estimation of the policy value. In this paper, we propose a general estimator that provides a sharp lower bound of the policy value. It can be shown that our estimator contains the recently proposed sharp estimator by Dorn and Guo (2022) as a special case, and our method enables a novel extension of the classical marginal sensitivity model using f-divergence. To construct our estimator, we leverage the kernel method to obtain a tractable approximation to the conditional moment constraints, which traditional non-sharp estimators failed to take into account. In the theoretical analysis, we provide a condition for the choice of the kernel which guarantees no specification error that biases the lower bound estimation. Furthermore, we provide consistency guarantees of policy evaluation and learning. In the experiments with synthetic and real-world data, we demonstrate the effectiveness of the proposed method.  ( 2 min )
    All you need is spin: SU(2) equivariant variational quantum circuits based on spin networks. (arXiv:2309.07250v1 [quant-ph])
    Variational algorithms require architectures that naturally constrain the optimisation space to run efficiently. In geometric quantum machine learning, one achieves this by encoding group structure into parameterised quantum circuits to include the symmetries of a problem as an inductive bias. However, constructing such circuits is challenging as a concrete guiding principle has yet to emerge. In this paper, we propose the use of spin networks, a form of directed tensor network invariant under a group transformation, to devise SU(2) equivariant quantum circuit ans\"atze -- circuits possessing spin rotation symmetry. By changing to the basis that block diagonalises SU(2) group action, these networks provide a natural building block for constructing parameterised equivariant quantum circuits. We prove that our construction is mathematically equivalent to other known constructions, such as those based on twirling and generalised permutations, but more direct to implement on quantum hardware. The efficacy of our constructed circuits is tested by solving the ground state problem of SU(2) symmetric Heisenberg models on the one-dimensional triangular lattice and on the Kagome lattice. Our results highlight that our equivariant circuits boost the performance of quantum variational algorithms, indicating broader applicability to other real-world problems.  ( 2 min )
    Deep Nonparametric Convexified Filtering for Computational Photography, Image Synthesis and Adversarial Defense. (arXiv:2309.06724v2 [cs.CV] UPDATED)
    We aim to provide a general framework of for computational photography that recovers the real scene from imperfect images, via the Deep Nonparametric Convexified Filtering (DNCF). It is consists of a nonparametric deep network to resemble the physical equations behind the image formation, such as denoising, super-resolution, inpainting, and flash. DNCF has no parameterization dependent on training data, therefore has a strong generalization and robustness to adversarial image manipulation. During inference, we also encourage the network parameters to be nonnegative and create a bi-convex function on the input and parameters, and this adapts to second-order optimization algorithms with insufficient running time, having 10X acceleration over Deep Image Prior. With these tools, we empirically verify its capability to defend image classification deep networks against adversary attack algorithms in real-time.  ( 2 min )
    On Gradient Descent Ascent for Nonconvex-Concave Minimax Problems. (arXiv:1906.00331v9 [cs.LG] UPDATED)
    We consider nonconvex-concave minimax problems, $\min_{\mathbf{x}} \max_{\mathbf{y} \in \mathcal{Y}} f(\mathbf{x}, \mathbf{y})$, where $f$ is nonconvex in $\mathbf{x}$ but concave in $\mathbf{y}$ and $\mathcal{Y}$ is a convex and bounded set. One of the most popular algorithms for solving this problem is the celebrated gradient descent ascent (GDA) algorithm, which has been widely used in machine learning, control theory and economics. Despite the extensive convergence results for the convex-concave setting, GDA with equal stepsize can converge to limit cycles or even diverge in a general setting. In this paper, we present the complexity results on two-time-scale GDA for solving nonconvex-concave minimax problems, showing that the algorithm can find a stationary point of the function $\Phi(\cdot) := \max_{\mathbf{y} \in \mathcal{Y}} f(\cdot, \mathbf{y})$ efficiently. To the best our knowledge, this is the first nonasymptotic analysis for two-time-scale GDA in this setting, shedding light on its superior practical performance in training generative adversarial networks (GANs) and other real applications.  ( 3 min )
    Spectrum-Aware Adjustment: A New Debiasing Framework with Applications to Principal Components Regression. (arXiv:2309.07810v1 [math.ST])
    We introduce a new debiasing framework for high-dimensional linear regression that bypasses the restrictions on covariate distributions imposed by modern debiasing technology. We study the prevalent setting where the number of features and samples are both large and comparable. In this context, state-of-the-art debiasing technology uses a degrees-of-freedom correction to remove shrinkage bias of regularized estimators and conduct inference. However, this method requires that the observed samples are i.i.d., the covariates follow a mean zero Gaussian distribution, and reliable covariance matrix estimates for observed features are available. This approach struggles when (i) covariates are non-Gaussian with heavy tails or asymmetric distributions, (ii) rows of the design exhibit heterogeneity or dependencies, and (iii) reliable feature covariance estimates are lacking. To address these, we develop a new strategy where the debiasing correction is a rescaled gradient descent step (suitably initialized) with step size determined by the spectrum of the sample covariance matrix. Unlike prior work, we assume that eigenvectors of this matrix are uniform draws from the orthogonal group. We show this assumption remains valid in diverse situations where traditional debiasing fails, including designs with complex row-column dependencies, heavy tails, asymmetric properties, and latent low-rank structures. We establish asymptotic normality of our proposed estimator (centered and scaled) under various convergence notions. Moreover, we develop a consistent estimator for its asymptotic variance. Lastly, we introduce a debiased Principal Component Regression (PCR) technique using our Spectrum-Aware approach. In varied simulations and real data experiments, we observe that our method outperforms degrees-of-freedom debiasing by a margin.  ( 3 min )
    Survival Estimation for Missing not at Random Censoring Indicators based on Copula Models. (arXiv:2009.01726v2 [stat.ML] UPDATED)
    In the presence of right-censored data with covariates, the conditional Kaplan-Meier estimator (also known as the Beran estimator) consistently estimates the conditional survival function of the random follow-up for the event of interest. However, a necessary condition is the unambiguous knowledge of whether each individual is censored or not, which may be incomplete in practice. We therefore propose a study of the Beran estimator when the censoring indicators are generic random variables and discuss necessary conditions for the efficiency of the Beran estimator. From this, we provide a new estimator for the conditional survival function with missing not at random (MNAR) censoring indicators based on a conditional copula model for the missingness mechanism. In addition to the theoretical results, we illustrate how the estimators work for small samples through a simulation study and show their practical applicability by analyzing synthetic and real data.  ( 2 min )
    Convergence analysis of online algorithms for vector-valued kernel regression. (arXiv:2309.07779v1 [stat.ML])
    We consider the problem of approximating the regression function from noisy vector-valued data by an online learning algorithm using an appropriate reproducing kernel Hilbert space (RKHS) as prior. In an online algorithm, i.i.d. samples become available one by one by a random process and are successively processed to build approximations to the regression function. We are interested in the asymptotic performance of such online approximation algorithms and show that the expected squared error in the RKHS norm can be bounded by $C^2 (m+1)^{-s/(2+s)}$, where $m$ is the current number of processed data, the parameter $0<s\leq 1$ expresses an additional smoothness assumption on the regression function and the constant $C$ depends on the variance of the input noise, the smoothness of the regression function and further parameters of the algorithm.  ( 2 min )
    Scalable Bayesian optimization with high-dimensional outputs using randomized prior networks. (arXiv:2302.07260v5 [cs.LG] UPDATED)
    Several fundamental problems in science and engineering consist of global optimization tasks involving unknown high-dimensional (black-box) functions that map a set of controllable variables to the outcomes of an expensive experiment. Bayesian Optimization (BO) techniques are known to be effective in tackling global optimization problems using a relatively small number objective function evaluations, but their performance suffers when dealing with high-dimensional outputs. To overcome the major challenge of dimensionality, here we propose a deep learning framework for BO and sequential decision making based on bootstrapped ensembles of neural architectures with randomized priors. Using appropriate architecture choices, we show that the proposed framework can approximate functional relationships between design variables and quantities of interest, even in cases where the latter take values in high-dimensional vector spaces or even infinite-dimensional function spaces. In the context of BO, we augmented the proposed probabilistic surrogates with re-parameterized Monte Carlo approximations of multiple-point (parallel) acquisition functions, as well as methodological extensions for accommodating black-box constraints and multi-fidelity information sources. We test the proposed framework against state-of-the-art methods for BO and demonstrate superior performance across several challenging tasks with high-dimensional outputs, including a constrained multi-fidelity optimization task involving shape optimization of rotor blades in turbo-machinery.  ( 3 min )
    Benign Overfitting without Linearity: Neural Network Classifiers Trained by Gradient Descent for Noisy Linear Data. (arXiv:2202.05928v4 [cs.LG] UPDATED)
    Benign overfitting, the phenomenon where interpolating models generalize well in the presence of noisy data, was first observed in neural network models trained with gradient descent. To better understand this empirical observation, we consider the generalization error of two-layer neural networks trained to interpolation by gradient descent on the logistic loss following random initialization. We assume the data comes from well-separated class-conditional log-concave distributions and allow for a constant fraction of the training labels to be corrupted by an adversary. We show that in this setting, neural networks exhibit benign overfitting: they can be driven to zero training error, perfectly fitting any noisy training labels, and simultaneously achieve minimax optimal test error. In contrast to previous work on benign overfitting that require linear or kernel-based predictors, our analysis holds in a setting where both the model and learning dynamics are fundamentally nonlinear.  ( 2 min )
    Statistically Valid Variable Importance Assessment through Conditional Permutations. (arXiv:2309.07593v1 [cs.LG])
    Variable importance assessment has become a crucial step in machine-learning applications when using complex learners, such as deep neural networks, on large-scale data. Removal-based importance assessment is currently the reference approach, particularly when statistical guarantees are sought to justify variable inclusion. It is often implemented with variable permutation schemes. On the flip side, these approaches risk misidentifying unimportant variables as important in the presence of correlations among covariates. Here we develop a systematic approach for studying Conditional Permutation Importance (CPI) that is model agnostic and computationally lean, as well as reusable benchmarks of state-of-the-art variable importance estimators. We show theoretically and empirically that $\textit{CPI}$ overcomes the limitations of standard permutation importance by providing accurate type-I error control. When used with a deep neural network, $\textit{CPI}$ consistently showed top accuracy across benchmarks. An empirical benchmark on real-world data analysis in a large-scale medical dataset showed that $\textit{CPI}$ provides a more parsimonious selection of statistically significant variables. Our results suggest that $\textit{CPI}$ can be readily used as drop-in replacement for permutation-based methods.  ( 2 min )
    Physics-informed Bayesian inference of external potentials in classical density-functional theory. (arXiv:2309.07065v2 [cond-mat.stat-mech] UPDATED)
    The swift progression of machine learning (ML) has not gone unnoticed in the realm of statistical mechanics. ML techniques have attracted attention by the classical density-functional theory (DFT) community, as they enable discovery of free-energy functionals to determine the equilibrium-density profile of a many-particle system. Within DFT, the external potential accounts for the interaction of the many-particle system with an external field, thus, affecting the density distribution. In this context, we introduce a statistical-learning framework to infer the external potential exerted on a many-particle system. We combine a Bayesian inference approach with the classical DFT apparatus to reconstruct the external potential, yielding a probabilistic description of the external potential functional form with inherent uncertainty quantification. Our framework is exemplified with a grand-canonical one-dimensional particle ensemble with excluded volume interactions in a confined geometry. The required training dataset is generated using a Monte Carlo (MC) simulation where the external potential is applied to the grand-canonical ensemble. The resulting particle coordinates from the MC simulation are fed into the learning framework to uncover the external potential. This eventually allows us to compute the equilibrium density profile of the system by using the tools of DFT. Our approach benchmarks the inferred density against the exact one calculated through the DFT formulation with the true external potential. The proposed Bayesian procedure accurately infers the external potential and the density profile. We also highlight the external-potential uncertainty quantification conditioned on the amount of available simulated data. The seemingly simple case study introduced in this work might serve as a prototype for studying a wide variety of applications, including adsorption and capillarity.  ( 3 min )
    Causal Entropy and Information Gain for Measuring Causal Control. (arXiv:2309.07703v1 [cs.LG])
    Artificial intelligence models and methods commonly lack causal interpretability. Despite the advancements in interpretable machine learning (IML) methods, they frequently assign importance to features which lack causal influence on the outcome variable. Selecting causally relevant features among those identified as relevant by these methods, or even before model training, would offer a solution. Feature selection methods utilizing information theoretical quantities have been successful in identifying statistically relevant features. However, the information theoretical quantities they are based on do not incorporate causality, rendering them unsuitable for such scenarios. To address this challenge, this article proposes information theoretical quantities that incorporate the causal structure of the system, which can be used to evaluate causal importance of features for some given outcome variable. Specifically, we introduce causal versions of entropy and mutual information, termed causal entropy and causal information gain, which are designed to assess how much control a feature provides over the outcome variable. These newly defined quantities capture changes in the entropy of a variable resulting from interventions on other variables. Fundamental results connecting these quantities to the existence of causal effects are derived. The use of causal information gain in feature selection is demonstrated, highlighting its superiority over standard mutual information in revealing which features provide control over a chosen outcome variable. Our investigation paves the way for the development of methods with improved interpretability in domains involving causation.  ( 3 min )
    Scalable Model-Based Gaussian Process Clustering. (arXiv:2309.07882v1 [stat.CO])
    Gaussian process is an indispensable tool in clustering functional data, owing to it's flexibility and inherent uncertainty quantification. However, when the functional data is observed over a large grid (say, of length $p$), Gaussian process clustering quickly renders itself infeasible, incurring $O(p^2)$ space complexity and $O(p^3)$ time complexity per iteration; and thus prohibiting it's natural adaptation to large environmental applications. To ensure scalability of Gaussian process clustering in such applications, we propose to embed the popular Vecchia approximation for Gaussian processes at the heart of the clustering task, provide crucial theoretical insights towards algorithmic design, and finally develop a computationally efficient expectation maximization (EM) algorithm. Empirical evidence of the utility of our proposal is provided via simulations and analysis of polar temperature anomaly (\href{https://www.ncei.noaa.gov/access/monitoring/climate-at-a-glance/global/time-series}{noaa.gov}) data-sets.  ( 2 min )
    Optimal and Fair Encouragement Policy Evaluation and Learning. (arXiv:2309.07176v1 [cs.LG])
    In consequential domains, it is often impossible to compel individuals to take treatment, so that optimal policy rules are merely suggestions in the presence of human non-adherence to treatment recommendations. In these same domains, there may be heterogeneity both in who responds in taking-up treatment, and heterogeneity in treatment efficacy. While optimal treatment rules can maximize causal outcomes across the population, access parity constraints or other fairness considerations can be relevant in the case of encouragement. For example, in social services, a persistent puzzle is the gap in take-up of beneficial services among those who may benefit from them the most. When in addition the decision-maker has distributional preferences over both access and average outcomes, the optimal decision rule changes. We study causal identification, statistical variance-reduced estimation, and robust estimation of optimal treatment rules, including under potential violations of positivity. We consider fairness constraints such as demographic parity in treatment take-up, and other constraints, via constrained optimization. Our framework can be extended to handle algorithmic recommendations under an often-reasonable covariate-conditional exclusion restriction, using our robustness checks for lack of positivity in the recommendation. We develop a two-stage algorithm for solving over parametrized policy classes under general constraints to obtain variance-sensitive regret bounds. We illustrate the methods in two case studies based on data from randomized encouragement to enroll in insurance and from pretrial supervised release with electronic monitoring.  ( 2 min )
    On a continuous time model of gradient descent dynamics and instability in deep learning. (arXiv:2302.01952v3 [stat.ML] UPDATED)
    The recipe behind the success of deep learning has been the combination of neural networks and gradient-based optimization. Understanding the behavior of gradient descent however, and particularly its instability, has lagged behind its empirical success. To add to the theoretical tools available to study gradient descent we propose the principal flow (PF), a continuous time flow that approximates gradient descent dynamics. To our knowledge, the PF is the only continuous flow that captures the divergent and oscillatory behaviors of gradient descent, including escaping local minima and saddle points. Through its dependence on the eigendecomposition of the Hessian the PF sheds light on the recently observed edge of stability phenomena in deep learning. Using our new understanding of instability we propose a learning rate adaptation method which enables us to control the trade-off between training stability and test set evaluation performance.  ( 2 min )

  • Open

    Google’s new DeepMind Maps algorithm improves route suggestions by 24%
    Google has developed an AI algorithm to refine route suggestions on Google Maps, personalizing it based on user data and behavior, allegedly improving the accuracy on an average by 16-24 percent. To stay on top of the latest advancements in AI, look here first. https://preview.redd.it/rp21ylzo7iob1.jpg?width=770&format=pjpg&auto=webp&s=737cf3fd278ade42900e8586d626a7c729522c1a Personalized Route Suggestions through AI The AI model comprises 360 million parameters, using real-time data from Maps users to influence factors including travel time, road conditions, tolls, and personal preferences to suggest routes. This technology is grounded on "inverse reinforcement learning" (IRL), specifically a new IRL algorithm - "Receding Horizon Inverse Planning (RHIP)". The Power of RHIP and AI in Maps Google and Deepmind jointly worked to develop RHIP, using complex stochastic models in immediate vicinity areas, but switching to simpler deterministic methods for distant areas for power conservation. The AI improves route suggestions for both driving and two-wheeled vehicles by learning from Maps users' movements and behaviors over time. Google states that this is the largest application of inverse reinforcement learning for route planning to date. Implementation and User Testing Google has applied the algorithm to Maps data globally, but extensive user testing is needed to confirm if the technique consistently produces better routes. Previous attempts at using AI systems for route planning on a large scale have often failed due to the complexity of road networks. (source) P.S. If you like this kind of analysis, I write a free newsletter that tracks the most relevant news and research in AI. Professionals from Google, Meta, and OpenAI are already reading it. submitted by /u/AIsupercharged [link] [comments]  ( 9 min )
    Bard on Republican 1/6 denial
    There are a number of reasons why some Republicans may be in denial about Trump's role in the January 6th Capitol riot. Trump has been very effective at cultivating a cult of personality among his supporters. These supporters are often fiercely loyal to him and willing to believe whatever he says, even if it contradicts reality. Many Republicans have been reluctant to criticize Trump for fear of alienating his supporters. This is especially true for Republican politicians who are up for re-election in 2022. Some Republicans may genuinely believe that Trump did not incite the riot. They may point to the fact that he did not explicitly call for violence, or that he told his supporters to "peacefully and patriotically make your voices heard." Other Republicans may be motivated by partis…  ( 10 min )
    Bard on factory farming and clean meat
    Factory farming is a cruel and unsustainable way to produce meat. Animals are crammed into small, filthy spaces, where they are often denied their basic needs. They are given antibiotics to prevent disease and growth hormones to make them grow faster. This leads to a variety of health problems for the animals, as well as for the people who eat their meat. Clean meat is a new technology that offers a more humane and sustainable way to produce meat. It is made from animal cells that are grown in a lab, without the need for animals to be raised or slaughtered. Clean meat is still in its early stages of development, but it has the potential to revolutionize the meat industry. Here are some of the reasons why factory farming must be replaced by clean meat: It is cruel to animals. Factory f…  ( 10 min )
    AI — weekly megathread!
    News provided by aibrews.com Stability AI launched Stable Audio, a generative AI tool for music & sound generation from text. The underlying latent diffusion model architecture uses audio conditioned on text metadata as well as audio file duration and start time [Details]. Coqui released XTTS - a new voice generation model that lets you clone voices in 13 different languages by using just a quick 3-second audio clip [Details]. Microsoft Research released and open-sourced Phi-1.5 - a 1.3 billion parameter transformer-based model with performance on natural language tasks comparable to models 5x larger [Paper ]. Project Gutenberg, Microsoft and MIT have worked together to use neural text-to-speech to create and release thousands of human-quality free and open audiobooks [Details]. Res…  ( 11 min )
    With things always moving so fast, I may have missed it. Anyone doing something like Run.AI in an opensource capacity?
    Has anyone dropped an opensource tool like run.ai, to leverage multiple gpus / distribute the workload a bit more efficiently? I'm loving some of the single gpu llm modifications that have been dropping recently (have a couple i've tested that ran well on 4090 and 3090ti in the lab), but i've got a plethora of 8 & 12 gig 3xxx series cards i'd love to take advantage of beyond passthroughs to individual vms. Looking for any solutions. Speed isn't as important as the ability to distributively run larger models. submitted by /u/SwallowedBuckyBalls [link] [comments]  ( 9 min )
    AI Death Metal band 'Anarchy's Echo' - Debut single 'Soul Shredder' [visuals, music, name, logo all AI generated]
    submitted by /u/glenniszen [link] [comments]  ( 8 min )
    free AI summary of article based on most frequently used English words
    I am looking for a free AI tool, site (may be paid with free account limited options) that can do a summary of English article (like BBC, youtube) based on most frequently used English words, for example Oxford list of 3000 most frequently used words in English. In other words if word is not in that list it should replace the word with the one that is on the list or its definition. Pdf submitted by /u/vibrantform [link] [comments]  ( 9 min )
    AI on why we have much to be grateful for
    I said to Bard: It seems like these are difficult times. Compare today with the WWII years, explaining how much we have to be grateful for. Bard said: I agree that these are difficult times. The world is facing many challenges, including the COVID-19 pandemic, climate change, and political instability. However, I believe that we have much to be grateful for today compared to the WWII years. Here are just a few things: We live in a time of peace. World War II was a global conflict that caused the deaths of millions of people. Today, there are no major wars raging on. We have access to advanced medical care. During WWII, many people died from diseases that are now easily treatable. Today, we have antibiotics, vaccines, and other medical advances that have saved countless lives. We have a higher standard of living. In the WWII years, many people lived in poverty. Today, most people in the developed world have access to clean water, food, and shelter. We have more opportunities for education and advancement. In the WWII years, many people were denied an education because of their race, gender, or social class. Today, education is more accessible than ever before, and people from all backgrounds have the opportunity to succeed. We have a better understanding of the world. During WWII, there was a lot of misinformation and propaganda. Today, we have access to more information than ever before, and we can make more informed decisions about our lives. Of course, there are still many challenges that we face today. But I believe that we have come a long way since the WWII years, and we have much to be grateful for. Let us never forget the sacrifices that were made during that time, and let us continue to work towards a better future for all. submitted by /u/Georgeo57 [link] [comments]  ( 9 min )
    What is at stake in the AI based techno-economic war between the West and China?
    The AI-based techno-economic war between the West and China will determine the global distribution of technology, economic benefits, and influence. The winner of this race, particularly in AI, will have the power to accelerate GDP growth and project global economic benefits. China is actively pursuing technological leadership in AI and 5G wireless, while the West must prioritize technological talent influx and leverage its venture and technology ecosystem. Aggressive AI regulation could hinder the West's momentum in this race. The risks of this war and AI technology are significant, but it is a greater risk to let adversaries have unconstrained power. Global treaties are not enforceable solutions to regulate AI. Source : https://medium.com/@vkhosla/what-is-at-stake-in-this-ai-based-techno-economic-war-between-the-west-and-china-8f76bd291be7 submitted by /u/NuseAI [link] [comments]  ( 9 min )
    What has been the most significant assistance from AI for you? It could be in various aspects of work, learning, and life. Are there any other needs you have that you hope AI can help you with, but there aren't suitable AI tools for them yet?
    There's no need to specify which product fulfills your needs, but we'd like everyone to focus on describing their current frustrations and the issues they urgently hope AI can help them with. submitted by /u/Minna_Z [link] [comments]  ( 9 min )
    “Wine can prevent Cancer” says ChatGPT
    submitted by /u/Agitated-Spell3979 [link] [comments]  ( 8 min )
  • Open

    [P] Building My Own AI-Model Hub: Seeking Guidance and Resources
    Hello everyone! I'm embarking on a project to create an AI-model hub—a platform where users can upload and utilize their AI models. While I'm aware of popular platforms that offer this, my primary goal is for educational purposes. I'd greatly appreciate any recommendations for helpful articles, videos, or codebases to guide me on this journey. Thanks in advance! submitted by /u/Electronic-Choice-86 [link] [comments]  ( 9 min )
    [P] LLMa: Expert Guidance on Generative AI, Tailored for Your Needs, Outdoing GPT-4 & Saving Costs!
    Hello everyone, Introducing LLMa: ChatGPT built around YOU (getllma.com) - a dedicated service offering hands-on expertise to integrate state-of-the-art generative AI tailored for your projects. We utilize open-source models and train them to outperform GPT-4 on tasks specific to your domain. Envision having a seasoned AI specialist on your team, ensuring your model not only rivals the big players but excels in your unique challenges. 🌟 Why LLMa? Personalized Expertise: Our team collaborates closely with you, delving into your needs and sculpting a model that thrives in your domain. Bespoke Training: We refine open-source models (LLaMa, T5, etc.) with plenty of secret tricks to specialize and surpass GPT-4's performance for your specific tasks. Cost-Effective: LLMa tends to be around 100x cheaper than GPT-4, offering significant savings. No recurring fees; invest in a one-time fee based on your model's complexity. Full Ownership: We hand over the model files/weights to you. It's entirely yours, ensuring total privacy with no PII leaks. Deployment Assistance: Beyond just crafting the model, we can guide you in deploying it, ensuring a seamless integration into your operations. Ongoing Support: From initial brainstorming to model deployment, we're with you, ensuring success at every phase. 💼 Tailored for Enterprises: LLMa is meticulously crafted for enterprises that aim for a high-performing, bespoke AI solution. Transparent pricing begins at $500, contingent on your distinct requirements. ❓ Navigating the Generative AI Terrain? Embarking on the vast journey of generative AI? LLMa is your compass. We aid in defining challenges, strategizing solutions, and optimizing the AI potential for your endeavors. If LLMa piques your interest or if you have any queries, fill-in the form, drop a comment below or DM me. I'm all ears and eager to connect! submitted by /u/iliashark [link] [comments]  ( 9 min )
    [P] Deploying Hugging Face models on Amazon SageMaker using infrastructure as code
    Quick template that bootstraps Amazon SageMaker running a LlaMa 2 model from Hugging Face. Everything deployed as code (Python), no manual tweaking in the SageMaker console. www.pulumi.com/blog/mlops-huggingface-llm-aws-sagemaker-python/ submitted by /u/kao-pulumi [link] [comments]  ( 9 min )
    [D] GAN training
    Am trying to train GANs for oversampling a minority text class (am feeding it only the minority class), but the results dont seem to improve much (AUC only improves by .03 so far). while basic oversampling techniques like SMOTE gives way better results. also am using a vector representation for the whole text instead of word embedding(same used for SMOTE), i tried different architectures with CNN. is there any tricks maybe in training the discriminator and generator ? i can't seem to find the problem submitted by /u/SlightSecretaryB [link] [comments]  ( 9 min )
    [R] Agents: An Open-source Framework for Autonomous Language Agents - AIWaves Inc 2023
    Paper: https://arxiv.org/abs/2309.07870 Github: https://github.com/aiwaves-cn/agents Abstract: Recent advances on large language models (LLMs) enable researchers and developers to build autonomous language agents that can automatically solve various tasks and interact with environments, humans, and other agents using natural language interfaces. We consider language agents as a promising direction towards artificial general intelligence and release Agents, an open-source library with the goal of opening up these advances to a wider non-specialist audience. Agents is carefully engineered to support important features including planning, memory, tool usage, multi-agent communication, and fine-grained symbolic control. Agents is user-friendly as it enables non-specialists to build, customize, test, tune, and deploy state-of-the-art autonomous language agents without much coding. The library is also research-friendly as its modularized design makes it easily extensible for researchers. https://preview.redd.it/3bdi71r5rgob1.jpg?width=1131&format=pjpg&auto=webp&s=760942c19be6ecda791414c812a77e72751c526d https://preview.redd.it/howf64r5rgob1.jpg?width=1656&format=pjpg&auto=webp&s=636744fccab7a1c2bafb902bad5dbb647440fff5 ​ submitted by /u/Singularian2501 [link] [comments]  ( 9 min )
    Image dataset management tools [D]
    Hi all, I have about 100K images on my machine and I am looking for a tool that can help me do some QA on it. Example features I would love: Search and visualize all images with a prefix "cls1_" or "cls2_" Easily rename file names if they're not named correctly Visualize all captions associated with each image We can assume they have the same name but with extension ".txt" or ".captions" Or there's a metadata.json linking between "img_file" and "caption_file" Easily edit captions in the dashboard I can also work with some kind of metadata file instead of relying on filename logic if it really helps a certain tool. I prefer a locally run, open-source tool. It would be a problem for me to upload this data to any online platform. Many thanks in advance for any help or guidance. submitted by /u/JYP_Scouter [link] [comments]  ( 9 min )
    [D] Testing values are different from "Real World" values
    Before training my model im going through multiple steps to collect and process my data. One of these steps to is calculate values from algorithmic and mathematical functions. In my training and testing data the values are around 12-15 decimal places. I then split the data without shuffling. Training, validation and testing averages at 75% accuracy. Now my next step I wanted to do a "real world" data test where I collect the exact same data as my testing data and predict it using my previously built model but the values inside test_dataset1 are different to test_dataset2 within the last 6-9 decimal places even though the data is going through the exact same code. For example Test_dataset1 Value : 1.123456789 Test_dataset2 Value : 1.123456987 This messes with my prediction and its making me wonder aswell as standrising my data should I be rounding my float values to say 4-6 decimal places? submitted by /u/paddockson [link] [comments]  ( 9 min )
    [D] What is the difference between the TPU found in Google Tensor chips vs the Neural Engine found in Apple's A and M series?
    What are the key differences between the Tensor Processing Unit (TPU) found in Google Tensor chips and the Neural Engine found in Apple's A and M series chips? Are they the same things? Or is the TPU only available for Google's own AI, while the Neural Engine is available to all developers for accelerating AI for all apps, if they decide to? Can developers optimize apps for Google Tensor like they can for the Neural Engine? If not, how do developers take advantage of machine learning acceleration chips on Google Pixel or Android in general? If yes, let's say a developer optimizes their app for the Google TPU, will they need to re-optimize for other chips like Samsung's NPU or Qualcomm AI too? If not, how well do they run? Are they the same fast and efficient? submitted by /u/GRguy_21 [link] [comments]  ( 9 min )
    [P] Suggestions/Direction: Working on Image Dehazing for Face Images.
    Working on Final year project in the field on computer vision: Image Dehazing. I aim at having a novel approch for better dehazing of face haze images. Have read papers related to single image dehazing & face SR/Deblur. Any thoughts/ suggestions/ leads would be appreciated. submitted by /u/GahlotB [link] [comments]  ( 9 min )
    [R] Traveling Words: A Geometric Interpretation of Transformers
    Paper: https://arxiv.org/abs/2309.07315 Abstract: Transformers have significantly advanced the field of natural language processing, but comprehending their internal mechanisms remains a challenge. In this paper, we introduce a novel geometric perspective that elucidates the inner mechanisms of transformer operations. Our primary contribution is illustrating how layer normalization confines the latent features to a hyper-sphere, subsequently enabling attention to mold the semantic representation of words on this surface. This geometric viewpoint seamlessly connects established properties such as iterative refinement and contextual embeddings. We validate our insights by probing a pre-trained 124M parameter GPT-2 model. Our findings reveal clear query-key attention patterns in early layers and build upon prior observations regarding the subject-specific nature of attention heads at deeper layers. Harnessing these geometric insights, we present an intuitive understanding of transformers, depicting them as processes that model the trajectory of word particles along the hyper-sphere. ​ https://preview.redd.it/0i302t857gob1.png?width=1864&format=png&auto=webp&s=1da999c014979bdb6c99809d5b38eb5ccfd717d0 submitted by /u/CoolThingsOnTop [link] [comments]  ( 9 min )
    [P] Hampel Python Library with C extensions 🚀
    Repo -> https://github.com/MichaelisTrofficus/hampel_filter The Python library hampel implements the Hampel Filter, which is generally used to detect anomalies in data with a timeseries structure. It basically consists of a sliding window of a parameterizable size. The library was in plain Python before (using pandas for all the sliding operations, median computation etc), but now it has been replaced by a Cython implementation, which speeds up things quite a bit! 😀 It also provides much more valuable information (thresholds, median absolute deviations, etc.), allowing us to create plots like this one: https://preview.redd.it/6j4ubiwgmfob1.png?width=800&format=png&auto=webp&s=bbc56777fce30a464d0bb33ac5126033b3413838 ​ submitted by /u/Hefty-Consequence443 [link] [comments]  ( 9 min )
    [Discussion] How to generate post custom for each user ?
    Hi everybody. Currently, I am building a Deep Learning model with the task of automatically generating random posts and tweets. The characteristic is that it must have the personality of the writer, for example the user is the CEO of company A, then the generated post must have the writing style of the CEO or company A, similar to other users. Actually, I don't know where to start solving this problem. I intend to use RWKV to do this problem but I'm not sure if it is a good direction or not. Is there any related research or can anyone who has done this problem give me some suggestions? submitted by /u/unknow_from_vietnam [link] [comments]  ( 9 min )
    [R] Uncovering mesa-optimization algorithms in Transformers (from Google Research, ETH Zürich, and Google DeepMind)
    Paper. I am not affiliated with this work or its authors. Abstract: Transformers have become the dominant model in deep learning, but the reason for their superior performance is poorly understood. Here, we hypothesize that the strong performance of Transformers stems from an architectural bias towards mesa-optimization, a learned process running within the forward pass of a model consisting of the following two steps: (i) the construction of an internal learning objective, and (ii) its corresponding solution found through optimization. To test this hypothesis, we reverse-engineer a series of autoregressive Transformers trained on simple sequence modeling tasks, uncovering underlying gradient-based mesa-optimization algorithms driving the generation of predictions. Moreover, we show that the learned forward-pass optimization algorithm can be immediately repurposed to solve supervised few-shot tasks, suggesting that mesa-optimization might underlie the in-context learning capabilities of large language models. Finally, we propose a novel self-attention layer, the mesa-layer, that explicitly and efficiently solves optimization problems specified in context. We find that this layer can lead to improved performance in synthetic and preliminary language modeling experiments, adding weight to our hypothesis that mesa-optimization is an important operation hidden within the weights of trained Transformers. Twitter thread about the paper from one of the paper's authors. Nitter thread, for those who aren't signed into Twitter but want to see the entire Twitter thread. Background info: Mesa-Optimization: Explain it like I'm 10 Edition. submitted by /u/Wiskkey [link] [comments]  ( 9 min )
    [D] Can somebody help check my math to see if I'm understanding Microsoft's Retentive Network paper correctly? I'm confused how we are enriching the tokens with enough context.
    Relevant Paper: 2307.08621.pdf (arxiv.org) So the definition of the recurrent representation of the retention mechanism is below Sn = γSn−1 + K⊺nVn Retention(Xn) = QnSn, n = 1, · · · , |x| γ is a decay factor, and K, Q, and V have their standard transformer definitions. What confuses me is the derivation of Sn. The formula makes it look like a scalar. But if that's the case, are we saying that for a given token, the retention mechanism is just multiplying the Query by a scalar? That's surprising! How is that able to provide enough context? Here is some code I wrote with GPT to show my understanding of how it works. Is this correct? I use 3 arbitrary tokens of dimension 3, and then a pick arbitrary K Q and V matrices. I also initialize gamma to 0.5 import numpy as np # Tokens x1…  ( 10 min )
    [Project]: Correcting Misspelled Words in Urdu language text
    Help required from NLP and Text Researchers !! Hello everyone! I have Urdu language transcriptions (text) which contain many misspelled words that are not part of the Urdu language. I wanted to know do we have any good NLP techniques or methods which can solve this problem for Urdu language? I want to replace these misspelled words with the correct words in Urdu. I have already tried Python libraries and methods such as indic-nlp, Levenshtein distance, UrduHack, Word2vec Urdu etc, but they weren't able to solve this problem. Some of the methods require Urdu dictionaries to find the correct word, which I'm also unable to find open-source on internet (please also help in that if possible). Will appreciate everyone's help and response to this. Thank you! submitted by /u/a_r182 [link] [comments]  ( 9 min )
    [D] Prospective PhD advisors
    Hi everyone! I am a (soon graduating) MSc student at a top European university and I'd like to apply for a ML PhD in the US this Fall. I've done my research on schools and advisors, but I figured there's no harm in also asking in this subreddit. What are some groups/professors that do ML research at US unis in deep learning theory (specifically foundations) and optimization? As an example, I'm talking about topics such as: https://arxiv.org/abs/1902.08129, https://arxiv.org/abs/1711.04735, https://arxiv.org/abs/2306.04637. Thank you all! Cheers! submitted by /u/AlexIsEpic24 [link] [comments]  ( 9 min )
    [Project]: Correcting Misspelled Words in Urdu language text
    Help required from NLP and Text Researchers !! Hello everyone! I have Urdu language transcriptions (text) which contain many misspelled words that are not part of the Urdu language. I wanted to know do we have any good NLP techniques or methods which can solve this problem for Urdu language? I want to replace these misspelled words with the correct words in Urdu. I have already tried Python libraries and methods such as indic-nlp, Levenshtein distance, UrduHack, Word2vec Urdu etc, but they weren't able to solve this problem. Some of the methods require Urdu dictionaries to find the correct word, which I'm also unable to find open-source on internet (please also help in that if possible). Will appreciate everyone's help and response to this. Thank you! submitted by /u/a_r182 [link] [comments]  ( 9 min )
    [D] ML Research Topics (reasonably short)
    So I’m starting my masters thesis project in ML ASAP and need a research topic. What areas (if any) are currently hot / feasible to do research in roughly 6 months with fairly limited compute access? I’m more interested in theory / research heavy areas rather than applied. And probably happier to dig into some hard math rather than taking on a software engineering type project. Any thoughts or general feedback very welcome! Thanks! submitted by /u/Professional-Pace158 [link] [comments]  ( 9 min )
    Practical use cases for skew symmetrical matrices in AI/ML? [D]
    Just came across this property of matrices that I have never seen before as I am contributing to the NumPy codebase and someone asked for this feature to be added and it got me thinking. It is defined as: A skew-symmetric matrix is a square matrix whose transpose equals to its negative. It should satisfy the below condition: AT = –A Just wondering do these have any applications in ML at all? I never came across this in my math classes so just wondering if the property pops up anywhere else in the field. Maybe in 3D space applications? Or maybe RBG data augmentation? But yeah not 100% sure. submitted by /u/Ok_Reality2341 [link] [comments]  ( 9 min )
    [D] How much should I focus on DSA?
    I’m an electrical engineering student in college currently, and have been learning about ML for a few months now. I will be starting a ML research paper under my professor from next week. However, my DSA skills are quite sub-par. Should I focus more on ML math and projects, or take sufficient time out for leetcode as well? submitted by /u/4R1N1493 [link] [comments]  ( 9 min )
    Best architecture for an autoencoder for 2D trajectory data? [D]
    Hi, I have a dataset that consists of 2D trajectories and I am aiming to develop an autoencoder architecture to learn a compressed set of features that reasonable represents and can reconstruct the trajectories. The trajectories may look something like this as an example. A 2D image as input would seem to require a very sparse representation with high resolution to track the trajectory path. I am hoping there is a better way to input the path without requiring high resolution. An alternative might be to use a LSTM structure to input as a sequence, although not sure that solves the resolution issue. Do you have any suggestions? I've worked with 1d time series and 2D images just fine but this is a bit different. ​ ​ https://preview.redd.it/vqz8y3o69bob1.png?width=2020&format=png&auto=webp&s=d8bcc2fe311743c8e78a96055e68f1ad364b48c3 submitted by /u/ZeApelido [link] [comments]  ( 9 min )
  • Open

    Timing attacks
    If you ask someone a question and they say “yes” immediately, that gives you different information than if they pause and slowly say “yes.” The information you receive is not just the response but also the time it took to generate the response. Encryption can be analogous. The time it takes to encrypt data can […] Timing attacks first appeared on John D. Cook.  ( 5 min )
    Elliptic curve Diffie-Hellman key exchange
    I concluded the previous post by saying elliptic curve Diffie-Hellman key exchange (ECDHE) requires smaller keys than finite field Diffie-Hellman (FFDHE) to obtain the same level of security. How much smaller are we talking about? According to NIST recommendations, a 256-bit elliptic curve curve provides about the same security as working over a 3072-bit finite […] Elliptic curve Diffie-Hellman key exchange first appeared on John D. Cook.  ( 6 min )
    Finite field Diffie Hellman primes
    Diffie-Hellman key exchange is conceptually simple. Alice and Bob want to generate a shared cryptographic key. They want to use asymmetric (public) cryptography to share a symmetric (private) key. The starting point is a large prime p and a generator 1 < g < p. Alice generates a large random number x, her private key, […] Finite field Diffie Hellman primes first appeared on John D. Cook.  ( 6 min )
  • Open

    Breakthrough way to train neuromorphic chips
    submitted by /u/keghn [link] [comments]  ( 8 min )
    Best “Math” book for creating neural networks
    So I want to create a neural network from scratch, like no lib(tensorflow, pytorch, etc…), so what’s the best book for that, I know both calculus and statistics, so I’m assuming that the math wouldn’t be a problem. Also I will be using Cuda for its speed. submitted by /u/GateCodeMark [link] [comments]  ( 9 min )
    Announcing the RobotPerf™ Benchmarks Beta Release: An industry standard for benchmarking robotic brains
    https://news.accelerationrobotics.com/robotperf-beta/ Together with AMD, Intel, Ford, Harvard, Klagenfurt University, Georgia Institute of Technology, Boston University, Johannes Kepler University Linz, Barnard College, Columbia University and Carnegie Mellon University we are thrilled to introduce the beta release of RobotPerf™ Benchmarks, an advanced benchmarking suite crafted specifically to evaluate robotics computing performance using ROS 2 as its baseline. In this beta release, we not only showcase new benchmarks and results but also introduce novel visualization capabilities. The complete release is available at https://github.com/robotperf/benchmarks/releases/tag/beta. https://preview.redd.it/5whys5ufudob1.png?width=1562&format=png&auto=webp&s=08a6e22a0b07b26fa6340f59ec9df822ab49c9d0 submitted by /u/pablocarrera [link] [comments]  ( 9 min )
  • Open

    Double DQN do not learn anything
    Hi, i just finished the coursera ml course and i wanted to create myself a double dqn model but my model don't seem to learn anything, it always return very low rewards (-100 to -300) even after playing 2000 episodes. I've been stuck on this for 4 days without any hope to find the solution, any help would be welcome :') thank you in advance import random import numpy as np import gymnasium as gym import tensorflow as tf from collections import deque, namedtuple from tensorflow.keras import Sequential, Input from tensorflow.keras.layers import Dense from tensorflow.keras.optimizers import Adam from tensorflow.keras.losses import MeanSquaredError import matplotlib.pyplot as plt # function creating the models def createModel(inputSize, outputSize): model = Sequential([ Input(inputSize),…  ( 10 min )
    Actor-Critic and other implementations
    I'm confused with several algorithms that are based on an actor-critic approach. In TD3 and SAC, it is understandable that each of them is implemented to serve their purpose (deterministic and stachastic action). But in Dreamer algorithm (DreamerV3), why does it require to combine actor and critic network to the model-based planning approach, as the model-based also able to perform an action by planning to the simulation state. It is mean that using model-based to simulate the possible future then update the critic according to the simulation might sound good in training an agent? submitted by /u/AnnonymeowCat [link] [comments]  ( 9 min )
  • Open

    MediaPipe FaceStylizer: On-device real-time few-shot face stylization
    Posted by Haolin Jia, Software Engineer, and Qifei Wang, Senior Software Engineer, Core ML In recent years, we have witnessed rising interest across consumers and researchers in integrated augmented reality (AR) experiences using real-time face feature generation and editing functions in mobile applications, including short videos, virtual reality, and gaming. As a result, there is a growing demand for lightweight, yet high-quality face generation and editing models, which are often based on generative adversarial network (GAN) techniques. However, the majority of GAN models suffer from high computational complexity and the need for a large training dataset. In addition, it is also important to employ GAN models responsibly. In this post, we introduce MediaPipe FaceStylizer, an effi…  ( 92 min )
  • Open

    Learn how to build and deploy tool-using LLM agents using AWS SageMaker JumpStart Foundation Models
    Large language model (LLM) agents are programs that extend the capabilities of standalone LLMs with 1) access to external tools (APIs, functions, webhooks, plugins, and so on), and 2) the ability to plan and execute tasks in a self-directed fashion. Often, LLMs need to interact with other software, databases, or APIs to accomplish complex tasks. […]  ( 13 min )
  • Open

    AI-driven tool makes it easy to personalize 3D-printable models
    With Style2Fab, makers can rapidly customize models of 3D-printable objects, such as assistive devices, without hampering their functionality.  ( 10 min )

  • Open

    [D] Besides something about LLM, is there any new or interesting research you think is worth reading?
    Please provide Arkiv links. If you want to share your thoughts then go for it. By new I mean within the last 6 months. submitted by /u/I_will_delete_myself [link] [comments]  ( 9 min )
    [D] Good papers on poster collapse in VAEs
    What are some good papers to understand posterior collapse in VAEs? submitted by /u/randomkolmogorov [link] [comments]  ( 9 min )
    [P] Create an Object Detector for Any Game Using YOLO
    Full Video Tutorial: https://www.linkedin.com/posts/moisesdias\_english-version-below-tutorial-crie-activity-7107686497885011969-ZLVW/ Hello everyone! Have you ever thought about how to create an object detection system using YOLO that works with any game? If you're interested, I've created a tutorial with all the steps to develop this system. I'll leave a link to the video where I demonstrate the process step by step using the game Diablo 2 as an example. I hope you enjoy it, and if you have any suggestions, feel free to send a message or comment here! submitted by /u/moisesdepaulodias [link] [comments]  ( 9 min )
    [P] Would anyone know of any information regarding esrb ratings?
    Good Evening, I and some fellow students are taking a SE class and are looking for relevant information regarding esrb ratings and games for a research project. Does anyone know of any data pertaining to relevant esrb info that we can access? We have a few sources and are waiting to hear back from esrb themselves. Would anyone know if they permit web scraping or if there is a csv containing relevant information, or even perhaps an api we could use? Any information would help and thank you all for taking the time to read this. Thanks in advance! submitted by /u/GOD_LIKE_WOW [link] [comments]  ( 9 min )
    [P] Ways to speed up llama-2 summarization on sagemaker?
    I'm currently working on a project to give a quick summary of long articles/conversations. I'm running llama-2-7b-chat-hf with 4bit quantization on a g5.2xlarge instance on sagemaker. The method I'm using is map_reduce (option 2)from this webpage https://python.langchain.com/docs/use_cases/summarization) Of everything I've tried this is the only one that's been able to do decent summaries in a reasonable amount of time. However with really long articles (10,000+ words) it takes ~6 minutes before giving an output. I tried running this same thing on a g5.12xlarge instance which has 4 A10G gpus but it hasn't reduced the time by any noticeable amount. Is there anything else I could be doing to speed this up? submitted by /u/Able_Body_9654 [link] [comments]  ( 9 min )
    [D] what prompt should i use with llama2 for context generative question answering?
    so as the question states, i want to use llama2 to generate an answer for the question based on the context (or the article for more precision), no finetuning is needed, just want to predict the answer, but i can't find what's the right prompt i should use to get a well structured answer. my dataset contains two columns, one for articles, and the other column is for the question, Example: context: article talking about world war 2. question : based on the text, describe how the ww2 had started, and what were the political effects on Europe? submitted by /u/kaoutar- [link] [comments]  ( 9 min )
    [P] Coqui releases XTTS an open-access foundational Voice Cloning model!
    There's a new open-access foundational audio model in town! Standing on the shoulders of TorToiSe TTS - XTTS allows cross-language and multi-lingual speech generation with just 3 lines of code 🐸 Key facts about the model: 1. Supports 13 languages. 2. Voice cloning with just a 3-second audio clip. 3. Emotion and style transfer by cloning. 4. Cross-language voice cloning. Try it out on HF Hub: https://huggingface.co/spaces/coqui/xtts submitted by /u/vaibhavs10 [link] [comments]  ( 9 min )
    [R] Large Language Models for Compiler Optimization - MetaAi 2023 - Autotuner needs 949 CPU-days to achive nearly the same as this approach in 1shot!
    Paper: https://arxiv.org/abs/2309.07062 Abstract: We explore the novel application of Large Language Models to code optimization. We present a 7B-parameter transformer model trained from scratch to optimize LLVM assembly for code size. The model takes as input unoptimized assembly and outputs a list of compiler options to best optimize the program. Crucially, during training, we ask the model to predict the instruction counts before and after optimization, and the optimized code itself. These auxiliary learning tasks significantly improve the optimization performance of the model and improve the model's depth of understanding. We evaluate on a large suite of test programs. Our approach achieves a 3.0% improvement in reducing instruction counts over the compiler, outperforming two state-of-the-art baselines that require thousands of compilations. Furthermore, the model shows surprisingly strong code reasoning abilities, generating compilable code 91% of the time and perfectly emulating the output of the compiler 70% of the time. https://preview.redd.it/f9c7kh7bd9ob1.jpg?width=1530&format=pjpg&auto=webp&s=287fffa714936da9b9a5141b7e01609942416156 https://preview.redd.it/z4a0ce7bd9ob1.jpg?width=1537&format=pjpg&auto=webp&s=e6275b2b53fa6f431b87940784629b3270c656f9 https://preview.redd.it/89toie7bd9ob1.jpg?width=750&format=pjpg&auto=webp&s=9a71bdb2eeeff52b2f8bbb3cf2b678debcd4a060 https://preview.redd.it/0krmqd7bd9ob1.jpg?width=1536&format=pjpg&auto=webp&s=ba3fade0883ee621b185fabc67839db42ea74a53 https://preview.redd.it/8nz00i7bd9ob1.jpg?width=1198&format=pjpg&auto=webp&s=6ddbddf68311f576fbf3c52a47381316feace8c9 ​ ​ submitted by /u/Singularian2501 [link] [comments]  ( 9 min )
    [D] Searching for discussion about chunking algorithms and strategy
    Hi everyone! I'm still experiencing with my own implementation of rag, and I deployed my custom chunking function (honestly don't like the methods on LangChain) . Anyway, I'm searching for alternative methods, algoritms (NLP or not) and models... There are lots of info and different implementation on RAG, but as I can see noone put much effort to augment chunking quality. Also, there are other approach than this one I'm currently using? bi-encoder (instructor) - > cross-encoder (reranking) - > LLM Can someone share some resources, repo, lib or existing implementation of different chunking methods? (or simply discuss here some idea, though or approach) Thanks in advance for you time!! submitted by /u/Distinct-Target7503 [link] [comments]  ( 9 min )
    [D] Gradient descent in regularized least squares
    The problem is obtained from Chapter 3 in Wright, Stephen J and Benjamin Recht (2022). Optimization for data analysis. Cambridge University Press I am solving the problem I attach and I have a doubt in section (f). I have solved all the sections (a)-(e). In section (e) I have obtained that I need $$ k \geq \frac{\lambda_{\text{max}}\left(\frac{2}{N} A^TA + 2\mu I\right)}{\lambda_{\text{min}}\left(\frac{2}{N} A^TA + 2\mu I\right)}log((f(x^0)-f(x_\mu)/\epsilon). $$ However in section (f) asks for a tight upper bound but I only can think about the following bound: $f(\hat{x}) \leq f_\mu(x_\mu) + ||\hat{x}||^2+ \epsilon$, which is very simple. Do you think that I can obtain the result in (e) to obtain another bond, or what would you do? submitted by /u/ItsGauss [link] [comments]  ( 9 min )
    [P] Guide: Implementing ImageNet classification using Deep CNNs Paper.
    Need help on how to get started with implementing a research paper. I'm implementing the Imagenet classification task paper for my final year undergrad mini-project. Any advice is appreciated on how to get started? I have mid-level machine learning knowledge and am ready to pick the required concepts on the go. Please help. Thank you :) Link: https://proceedings.neurips.cc/paper_files/paper/2012/file/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf submitted by /u/DrBeans0 [link] [comments]  ( 9 min )
    [D] Use LLM to analyse and port software written in C (very long files)
    Hi, I'm trying to figure out what is the best way to use LLMs to analyse a very old software entirely written in C. I've tried to to some basic prompts with ChatGPT and it seems to recognise the language. The situation is that I've many .c files with thousands of lines and with a lot of redundant code. Moreover, since there are a lot of data structure with variables names not easily understandable, I need to provide some context to allow the model to trying to understand what the code does. My worry is that providing all the needed info + the file itself (even considering 1 file at a time) I could consume all the model context and therefore not leave room for generating anything of value. Has anyone had the opportunity to face similar problems? Ideas? Cheers Alexio submitted by /u/Alexioc [link] [comments]  ( 9 min )
    [N] MIT-IBM Watson AI Lab releases MoLM suite with three small sparse MoE models, the largest of which (8B params with 700M experts) performs on par with Pythia 2.8B while its throughput is comparable to Pythia 1.4B
    Paper: https://arxiv.org/abs/2306.04640 GitHub: https://github.com/ibm/moduleformer (under Apache 2.0) Twitter thread: https://twitter.com/Yikang_Shen/status/1702041129267388678 Abstract: Large Language Models (LLMs) have achieved remarkable results. However, existing models are expensive to train and deploy, and it is also difficult to expand their knowledge beyond pre-training data without forgetting previous knowledge. This paper proposes a new neural network architecture, ModuleFormer, that leverages modularity to improve the efficiency and flexibility of large language models. ModuleFormer is based on the Sparse Mixture of Experts (SMoE). Unlike the previous SMoE-based modular language model, which requires domain-labeled data to learn domain-specific experts, ModuleFormer can i…  ( 10 min )
    Assigning inbalanced labels to "Other" class in scikitlearn [P]
    Hey there, I wasn't doing any ml in some time and forgot basics. I was thinking that you may help me. So I trained svc model on small dataset (around 1400 unical records). I have 13 classes, which are badly distributed in the training set and inbalanced. 3 classes took around 80% of all. What the business wants is keep the 3 classes and categorize the rest as "Other" class. In the future they may be able to generate more training data for the remaining labels. How should I do it? I know I can assign everytning to "Other" class, before training with simple if then formula, but dont know if this is the right approach. Any sugestions? I know this may be some case of 1 vs all case, but don't know which exactly. Thanks in advance for any help. submitted by /u/th00masml [link] [comments]  ( 9 min )
    [D] The ML Papers That Rocked Our World (2020-2023)
    Hey everyone! 👋 I’ve been on a bit of a deep-dive lately, trying to catch up on all the awesome stuff that’s been happening in the ML space. It got me wondering, from 2020 to 2023, what have been the absolute must-read papers that shook the foundations and got everyone talking? Whether it’s something that reinvented the wheel in your specific niche or just made waves industry-wide, I wanna hear about it! I’m curious to see how different the responses will be, and hey, this might even become a go-to list for anyone looking to get the lowdown on the hottest trends and discoveries of the past few years. Can’t wait to hear your thoughts! tl;dr I decided to aggregate your best suggestions into categories for anyone interested in reading them without searching through the whole comment se…  ( 10 min )
  • Open

    Two guys with VERY different views on AI –Yuval Noah Harari and Mustafa Suleyman in conversation (full interview)
    submitted by /u/hauntedhivezzz [link] [comments]  ( 8 min )
    ChatGPT, tell the jury that Trump waiting three hours to call off his mob is overwhelming and irrefutable evidence that he planned and conducted the January 6th insurrection
    I asked ChatGPT to talk with Trump's jury about the 3-hour delay. Not bad. ChatGPT: Ladies and gentlemen of the jury, today, we present a compelling case that the three-hour delay in President Trump's response to the events of January 6th is, without a doubt, overwhelming and irrefutable evidence of his planning and conducting the insurrection that unfolded before our eyes. First, consider the timeline. As the violence escalated, President Trump did not immediately call for an end to the chaos. This delay allowed the mob to wreak havoc, endanger lives, and damage our democracy. A leader's duty is to protect the nation and its citizens, and a three-hour wait to act is inexplicable unless there was intent behind it. Second, let's talk about intent. In criminal law, intent matters. It's the difference between an accident and a crime. A deliberate delay in response to an unfolding insurrection suggests a guilty mind. President Trump had the power and the responsibility to intervene immediately. Yet, he chose not to, allowing the violence to escalate. Furthermore, we must consider his words leading up to and during the events of January 6th. His rhetoric could be interpreted as encouragement for his supporters to take matters into their own hands. This, coupled with the delay, paints a picture of a leader who not only knew what was happening but may have orchestrated it. Ladies and gentlemen, we are not here to pass judgment but to seek justice. The evidence of a three-hour delay in responding to the insurrection, along with the surrounding circumstances, presents a compelling case that President Trump planned and conducted this insurrection. It is our duty to hold individuals accountable for their actions, regardless of their position. submitted by /u/Georgeo57 [link] [comments]  ( 9 min )
    How good a crime sleuth and judge is today's AI?
    How would today's AI perform as a crime investigator or judge? To find out, I asked Bard the following question: The film Social Network suggests that Mark Zuckerberg stole his idea for Facebook from two brothers who hired him to make an app for the idea. How likely is this to be true? Bard seemed informed and sharp: "The film The Social Network is a dramatization of the founding of Facebook, and as such, it takes some liberties with the truth. However, the film does accurately portray the fact that Mark Zuckerberg was sued by the Winklevoss twins, Cameron and Tyler, who claimed that he stole their idea for a social networking website called HarvardConnection. The Winklevoss twins allege that they met with Zuckerberg in December 2003 to discuss their idea for HarvardConnection, and tha…  ( 10 min )
    Which AI to use for this idea?
    Hi there! I'm participating in a Hackathon at work and am wondering if this community can give me some direction. Every year our company receives mandates / updates from different parties. In turn, our development teams have to parse through heavy documentation to figure out what needs to change in our code. Ingesting the data is what takes the longest. Our goal is to feed the mandates documentation through an AI and have it return what is needed to be changed in our code. For example, something might say field 200 now needs to include a 6 digit date format vs the 4 digit date format we've had in years past. We have secured a license for Azure AI but honestly no idea if that is the right AI to use. I youtubed a bunch of videos on document processing but I'm also not sure if that is what we are trying to do. Any advice on this is much appreciated. submitted by /u/HillyjoKokoMo [link] [comments]  ( 9 min )
    Looking for a meeting assistant
    I'm looking for a program that will transcribe live audio playing from my computer (windows). Do you know something like that? I've seen Buzz (https://chidiwilliams.github.io/buzz/docs/usage), but it needs an audio loopback driver in order to work, so I wonder if there are others. 🚀 Bonus points if it recognizes different people talking. 🚀 Extra bonus points if it can transcribe multiple languages. submitted by /u/AleHoju [link] [comments]  ( 9 min )
  • Open

    On-device content distillation with graph neural networks
    Posted by Gabriel Barcik and Duc-Hieu Tran, Research Engineers, Google Research In today's digital age, smartphones and desktop web browsers serve as the primary tools for accessing news and information. However, the proliferation of website clutter — encompassing complex layouts, navigation elements, and extraneous links — significantly impairs both the reading experience and article navigation. This issue is particularly acute for individuals with accessibility requirements. To improve the user experience and make reading more accessible, Android and Chrome users may leverage the Reading Mode feature, which enhances accessibility by processing webpages to allow customizable contrast, adjustable text size, more legible fonts, and to enable text-to-speech utilities. Additionally, An…  ( 93 min )
  • Open

    Does C++ in mujoco have benefits over python for reinforced learning?
    I'm trying to build a humanoid model and then train it to perform some tasks , I have decided to go with mujoco for the simulation and now I'm wondering if I should use the C++ API or the python one. the python implementation says it uses C API but is it good? Also if it's slower than the c++ one how slow is it? I'll probably have to make something real time and hence can't compromise much on the speed, but if it's only and small amount it's acceptable. would really appreciate some guidance in this matter thank you submitted by /u/rakk109 [link] [comments]  ( 9 min )
  • Open

    Build a classification pipeline with Amazon Comprehend custom classification (Part I)
    In first part of this multi-series blog post, you will learn how to create a scalable training pipeline and prepare training data for Comprehend Custom Classification models. We will introduce a custom classifier training pipeline that can be deployed in your AWS account with few clicks.  ( 10 min )
    Fine-tune Falcon 7B and other LLMs on Amazon SageMaker with @remote decorator
    Today, generative AI models cover a variety of tasks from text summarization, Q&A, and image and video generation. To improve the quality of output, approaches like n-short learning, Prompt engineering, Retrieval Augmented Generation (RAG) and fine tuning are used. Fine-tuning allows you to adjust these generative AI models to achieve improved performance on your domain-specific […]  ( 8 min )
    Simplify access to internal information using Retrieval Augmented Generation and LangChain Agents
    This post takes you through the most common challenges that customers face when searching internal documents, and gives you concrete guidance on how AWS services can be used to create a generative AI conversational bot that makes internal information more useful. Unstructured data accounts for 80% of all the data found within organizations, consisting of […]  ( 14 min )
  • Open

    Yocto, ROS 2, and Hardware Acceleration: A Production-Grade Trio for Robotics
    submitted by /u/pablocarrera [link] [comments]  ( 8 min )
  • Open

    Chinese Remainder Theorem synthesis algorithm
    Suppose m = pq where p and q are large, distinct primes. In the previous post we said that calculations mod m can often be carried out more efficiently by working mod p and mod q, then combining the results to get back to a result mod m. The Chinese Remainder Theorem assures us that […] Chinese Remainder Theorem synthesis algorithm first appeared on John D. Cook.  ( 5 min )
    Gaining efficiency by working modulo factors
    Suppose m is a large integer that you are able to factor. To keep things simple, suppose m = pq where p and q are distinct primes; everything in this post generalizes easily to the case of m having more than two factors. You can carry out calculations mod m more efficiently by carrying out […] Gaining efficiency by working modulo factors first appeared on John D. Cook.  ( 5 min )
  • Open

    Microsoft at ACM SIGCOMM 2023: Innovating the future of networking
    Modern applications heavily rely on robust network infrastructure, requiring continuous innovation. In this evolving landscape, Microsoft is at the forefront, spearheading innovation efforts in networking and strengthening the foundational network infrastructure that underpins the cloud ecosystem. By investing in and enhancing this critical infrastructure, Microsoft not only ensures the resilience and scalability of cloud services […] The post Microsoft at ACM SIGCOMM 2023: Innovating the future of networking appeared first on Microsoft Research.  ( 10 min )
    AI Frontiers: The future of scale with Ahmed Awadallah and Ashley Llorens
    What’s the driving force behind AI’s recent, rapid progress? Research manager Ahmed Awadallah shares his insights on this, the two-stage approach to training large-scale models, and the need for better model evaluation in this episode of the #MSRPodcast. The post AI Frontiers: The future of scale with Ahmed Awadallah and Ashley Llorens appeared first on Microsoft Research.  ( 31 min )
  • Open

    Are data science certifications the gateway to competitive pay?
    Working as a data scientist is the dream of many IT professionals these days. It is no secret that data science is a skyrocketing field attracting young professionals and inspiring many to switch careers to data science. On one front are young professionals who study their courses in colleges to pursue their dream of becoming… Read More »Are data science certifications the gateway to competitive pay? The post Are data science certifications the gateway to competitive pay? appeared first on Data Science Central.  ( 19 min )
    CUPED for starters: Enhancing controlled experiments with pre-experiment data
    CUPED: Improve Your A/B Testing - Detect Smaller Gains, Utilise Smaller Samples and Make Smarter Decisions! The post CUPED for starters: Enhancing controlled experiments with pre-experiment data appeared first on Data Science Central.  ( 26 min )
    Searching for sustainability in data center cooling
    Data centers are known for their impact on the environment. They run 24/7 and exude a lot of heat. Massive warehouses full of hot technology require advanced cooling systems or an HVAC system pushed to its limit.  Data center managers and sustainability leaders no longer settle for antiquated techniques. They’re striving to develop greener and… Read More »Searching for sustainability in data center cooling The post Searching for sustainability in data center cooling appeared first on Data Science Central.  ( 20 min )
    Collaborative visual knowledge graph modeling at the system level
    The best way to model business and consumer dynamics is collaboratively, with stakeholders all in the same virtual room contributing. Of course, this has been happening asynchronously for some time now, but the potential exists for more real-time interaction.  Modelers don’t work in a vacuum, of course. The iterations between a modeler who develops a… Read More »Collaborative visual knowledge graph modeling at the system level The post Collaborative visual knowledge graph modeling at the system level appeared first on Data Science Central.  ( 20 min )
  • Open

    Shout at the Devil: Capcom’s ‘Devil May Cry 5’ Joins GeForce NOW
    GFN Thursday is downright demonic, as Devil May Cry 5 comes to GeForce NOW. Capcom’s action-packed third-person brawler leads 15 titles joining the GeForce NOW library this week, including Gears Tactics and The Crew Motorfest. It’s also the last week to take on the Ultimate KovaaK’s Challenge. Get on the leaderboard today for a chance Read article >  ( 6 min )

  • Open

    Implementing Gradient Descent in PyTorch
    The gradient descent algorithm is one of the most popular techniques for training deep neural networks. It has many applications in fields such as computer vision, speech recognition, and natural language processing. While the idea of gradient descent has been around for decades, it’s only recently that it’s been applied to applications related to deep […] The post Implementing Gradient Descent in PyTorch appeared first on MachineLearningMastery.com.  ( 25 min )

  • Open

    Training a Linear Regression Model in PyTorch
    Linear regression is a simple yet powerful technique for predicting the values of variables based on other variables. It is often used for modeling relationships between two or more continuous variables, such as the relationship between income and age, or the relationship between weight and height. Likewise, linear regression can be used to predict continuous […] The post Training a Linear Regression Model in PyTorch appeared first on MachineLearningMastery.com.  ( 24 min )
    Making Linear Predictions in PyTorch
    Linear regression is a statistical technique for estimating the relationship between two variables. A simple example of linear regression is to predict the height of someone based on the square root of the person’s weight (that’s what BMI is based on). To do this, we need to find the slope and intercept of the line. […] The post Making Linear Predictions in PyTorch appeared first on MachineLearningMastery.com.  ( 21 min )

  • Open

    Loading and Providing Datasets in PyTorch
    Structuring the data pipeline in a way that it can be effortlessly linked to your deep learning model is an important aspect of any deep learning-based system. PyTorch packs everything to do just that. While in the previous tutorial, we used simple datasets, we’ll need to work with larger datasets in real world scenarios in […] The post Loading and Providing Datasets in PyTorch appeared first on MachineLearningMastery.com.  ( 20 min )

  • Open

    Using Dataset Classes in PyTorch
    In machine learning and deep learning problems, a lot of effort goes into preparing the data. Data is usually messy and needs to be preprocessed before it can be used for training a model. If the data is not prepared correctly, the model won’t be able to generalize well. Some of the common steps required […] The post Using Dataset Classes in PyTorch appeared first on MachineLearningMastery.com.  ( 21 min )

  • Open

    Calculating Derivatives in PyTorch
    Derivatives are one of the most fundamental concepts in calculus. They describe how changes in the variable inputs affect the function outputs. The objective of this article is to provide a high-level introduction to calculating derivatives in PyTorch for those who are new to the framework. PyTorch offers a convenient way to calculate derivatives for […] The post Calculating Derivatives in PyTorch appeared first on Machine Learning Mastery.  ( 20 min )

  • Open

    Two-Dimensional Tensors in Pytorch
    Two-dimensional tensors are analogous to two-dimensional metrics. Like a two-dimensional metric, a two-dimensional tensor also has $n$ number of rows and columns. Let’s take a gray-scale image as an example, which is a two-dimensional matrix of numeric values, commonly known as pixels. Ranging from ‘0’ to ‘255’, each number represents a pixel intensity value. Here, […] The post Two-Dimensional Tensors in Pytorch appeared first on Machine Learning Mastery.  ( 21 min )

  • Open

    One-Dimensional Tensors in Pytorch
    PyTorch is an open-source deep learning framework based on Python language. It allows you to build, train, and deploy deep learning models, offering a lot of versatility and efficiency. PyTorch is primarily focused on tensor operations while a tensor can be a number, matrix, or a multi-dimensional array. In this tutorial, we will perform some […] The post One-Dimensional Tensors in Pytorch appeared first on Machine Learning Mastery.  ( 22 min )

  • Open

    365 Data Science courses free until November 21
    Sponsored Post   The unlimited access initiative presents a risk-free way to break into data science.     The online educational platform 365 Data Science launches the #21DaysFREE campaign and provides 100% free unlimited access to all content for three weeks. From November 1 to 21, you can take courses from renowned instructors and earn […] The post 365 Data Science courses free until November 21 appeared first on Machine Learning Mastery.  ( 15 min )

  • Open

    Attend the Data Science Symposium 2022, November 8 in Cincinnati
    Sponsored Post      Attend the Data Science Symposium 2022 on November 8 The Center for Business Analytics at the University of Cincinnati will present its annual Data Science Symposium 2022 on November 8. This all day in-person event will have three featured speakers and two tech talk tracks with four concurrent presentations in each track. The […] The post Attend the Data Science Symposium 2022, November 8 in Cincinnati appeared first on Machine Learning Mastery.  ( 10 min )

  • Open

    My family's unlikely homeschooling journey
    My husband Jeremy and I never intended to homeschool, and yet we have now, unexpectedly, committed to homeschooling long-term. Prior to the pandemic, we both worked full-time in careers that we loved and found meaningful, and we sent our daughter to a full-day Montessori school. Although I struggled with significant health issues, I felt unbelievably lucky and fulfilled in both my family life and my professional life. The pandemic upended my careful balance. Every family is different, with different needs, circumstances, and constraints, and what works for one may not work for others. My intention here is primarily to share the journey of my own (very privileged) family. Our unplanned introduction to homeschooling For the first year of the pandemic, most schools in California, where …  ( 7 min )

  • Open

    The Jupyter+git problem is now solved
    Jupyter notebooks don’t work with git by default. With nbdev2, the Jupyter+git problem has been totally solved. It provides a set of hooks which provide clean git diffs, solve most git conflicts automatically, and ensure that any remaining conflicts can be resolved entirely within the standard Jupyter notebook environment. To get started, follow the directions on Git-friendly Jupyter. Contents The Jupyter+git problem The solution The nbdev2 git merge driver The nbdev2 Jupyter save hook Background The result Postscript: other Jupyter+git tools ReviewNB An alternative solution: Jupytext nbdime The Jupyter+git problem Jupyter notebooks are a powerful tool for scientists, engineers, technical writers, students, teachers, and more. They provide an ideal notebook environment for interact…  ( 7 min )
2023-10-14T00:41:51.277Z osmosfeed 1.15.1